When Models Disagree: Cross-Model Divergence Analysis for Ambiguity Risk Estimation in Software Requirements
Jack Lakkapragada
As AI systems generate increasing volumes of software code, one bottleneck in trustworthy software development is increasingly shifting upstream: specifying what code should do, and verifying that it does so. We investigate whether cross-model divergence in specification generation can serve as a practical signal for requirement underspecification. We present a structured experimental framework — a 30-requirement dataset spanning three ambiguity tiers, a six-category divergence taxonomy, and a normalized JSON comparison pipeline — applied across three contemporary LLMs: Claude Sonnet 4.6, GPT-4o, and Llama 3.3 70B. Divergence increases with pre-labeled ambiguity tier (TDR: 7.2, 7.4, 7.9), assumption divergence dominates across all tiers, and contradiction collapses to zero in adversarially ambiguous requirements. Operationally clear requirements still exhibit substantial latent interpretive variance, with many false consensus events in Tier A concentrated in security constraints. Framework, taxonomy, and dataset were frozen prior to execution, enabling reproducible analysis without post-hoc adjustment.
The pre-registered protocol with frozen prompts and model blinding is a real strength, but the headline tier divergence rate of 7.2, 7.4, and 7.9 is too compressed by the ceiling effect to support much, and an FCR of 1.0 across every tier means that metric is not discriminating at all. The category distribution, especially the contradiction collapse to zero in Tier C, is the more convincing signal and would be a better thing to lead with. The shared training data across Claude, GPT-4o, and Llama also means convergence cannot be read as clarity, which caps how far the divergence-as-signal claim can go without ground truth.
Research topic is interesting and important. Would be useful to extend to more test cases and also making the grading scale such that differences in capability are more clearly visible.
Cite this work
@misc {
title={
(HckPrj) When Models Disagree: Cross-Model Divergence Analysis for Ambiguity Risk Estimation in Software Requirements
},
author={
Jack Lakkapragada
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


