May 25, 2026
Spec Mutation Survival Analyzer
Guy Nutman, Nir Nutman
An automated auditing pipeline that measures the actual strength of AI-generated formal specifications by testing how much "real work" each clause is doing.
The Problem: Formal verification is only as strong as the specification it relies on. Even with few-shot prompting, LLMs consistently generate redundant, overlapping constraints. If a spec is bloated with dead weight, the code is being verified against the wrong rules.
How It Works:
Generate: Uses Gemini 2.5 Flash to write a baseline suite of Hypothesis property tests for a specific function.
Mutate: Parses the spec's AST and surgically alters it (flipping relational operators or deleting asserts) to create broken variants.
Execute: Runs the baseline and all mutations against a suite of correct and intentionally buggy implementations.
Score: Calculates Spec Coverage. If a mutation "survives" (catches no bugs), the altered clause was redundant. If it is "killed," the clause was load-bearing.
Prompt engineering alone cannot guarantee secure program synthesis. Programmatic validation—measuring "Spec Coverage" the same way we measure code coverage—is strictly necessary to audit AI-generated constraints.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Spec Mutation Survival Analyzer
},
author={
Guy Nutman, Nir Nutman
},
date={
5/25/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


