May 24, 2026
SpecMutate: Coverage-Guided Specification Diagnosis and Repair for Property-Based Tests
Aditya Singh
SpecMutate achieves 15/15 diagnostic accuracy and 10/10 repair convergence on SpecMutate-15, a balanced 15-task benchmark spanning underconstrained, overconstrained, and correct Hypothesis specification classes. As large language models generate increasing volumes of test specifications, unverified specs become a silent failure mode: underconstrained specs allow buggy implementations to pass verification, while overconstrained specs reject correct implementations. SpecMutate diagnoses whether a Hypothesis specification is underconstrained, overconstrained, or correct using a 3-tier cascade of deterministic gates and four independent signals, then repairs broken specs through a feedback-guided LLM loop grounded in AST-level mutation evidence, coverage delta quantification, and concrete counterexample inputs. Ablation reveals the cascade is load-bearing asymmetrically: removing the primary overconstrained gate drops accuracy by 33.3%, while removing the underconstrained guard costs zero accuracy because the weighted fusion formula independently handles those cases. SpecMutate is the first Python-native, Hypothesis-native system to close the diagnosis–repair loop with AST-level constraint localization.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) SpecMutate: Coverage-Guided Specification Diagnosis and Repair for Property-Based Tests
},
author={
Aditya Singh
},
date={
5/24/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


