May 25, 2026
Specmut: Semantic Mutation Testing for Formal Specification Tightness
Tyler Rector
This project develops Specmut, a mutation-based validation tool for formal specifications. The core problem is that formal verification can prove code satisfies a specification, but it cannot prove the specification is strong enough to capture the intended behavior. Specmut generates nearby specification mutants and measures how often those mutants are semantically distinguished, producing a tightness score called τ. In the LLM experiment, Qwen generated baseline Lean-style theorem specifications and then repaired them using Specmut feedback. The repaired specifications improved tightness on all three non-ceiling tasks: list reverse increased from median τ = 0.647 to 0.960, set insert from 0.571 to 0.923, and sorting from 0.136 to 0.960. Across 37 analyzable paired comparisons, the median improvement was Δτ = 0.3516, with Wilcoxon p = 5.21 × 10⁻⁶. The result shows that semantic mutation feedback can guide LLM-generated specifications toward stronger, more behavior-constraining formal statements.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Specmut: Semantic Mutation Testing for Formal Specification Tightness
},
author={
Tyler Rector
},
date={
5/25/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


