May 25, 2026
Counterexample-Guided Validation & Repair of LLM-Generated Safety Specifications
Yifei Lu
LLMs can turn natural-language safety requirements into formal specifications, but a fluent specification can still be the wrong one. It might quietly drop a guard, block access the policy should allow, or respond inconsistently when a safety-relevant input changes. We built a pipeline that checks whether an LLM-generated specification matches the intended policy, using biosecurity access control as a testbed. Each requirement is broken into traceable safety obligations such as clearance, training, approval validity, expiration, external anonymization, and emergency logging. Candidate specifications are written in a restricted DSL, and four oracles judge each one together: obligation coverage, sampled behavioral agreement, metamorphic relations, and SMT-guided counterexample search over Z3. Every candidate ends up with a risk-weighted review card. Across 30 simulated LLM candidates and 60 controlled mutants, small manual test suites caught only 63.3% of injected faults and 55.3% of the high-severity ones, while systematic validation reached 98.9% overall and 100% on high-severity faults. Feeding counterexamples back as repair signals fixed 29 of 30 candidates, raising the mean score from 70.1 to 95.4 and clearing every high-severity under-constraint, with one regression. In a live run, specifications from Gemini 3.5 Flash were parsed and validated end-to-end, and obligation-guided prompting reached full obligation coverage. Together, these results show that LLM-generated safety specifications can be validated systematically, with each failure traced back to a specific missing guard.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Counterexample-Guided Validation & Repair of LLM-Generated Safety Specifications
},
author={
Yifei Lu
},
date={
5/25/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


