TrojanSpec-Bench: Adversarial Specification Elicitation in AI-Assisted Formal Verification
Mohammad Zeeshan
TrojanSpec-Bench is the first benchmark to treat the natural-language-to-specification elicitor in AI-assisted formal verification as a Dolev-Yao adversary. We release 1,024 verifier-admitted trojan triples across Dafny, Lean 4, and Verus, spanning four attack patterns anchored in real disclosed libcrux bugs. Our SpecGuard detector decomposes a coarse LLM faithfulness judgment into four atomic Yes/No criteria and flags when at least two fail. It lifts F1 from 0.871 to 0.967 over the consensus baseline, holds 1.000 recall, and flags only 3 of 100 honest Lean Mathlib lemmas. Against the ICSE 2026 MutDafny baseline on Dafny it reaches F1 0.992 versus 0.530.
Aligning natural language with formal specifications is a challenging yet crucial task. This work safeguards the specification generation process using multiple validation criteria. Leveraging this feedback to iteratively refine specifications represents a promising direction for future work. Although our approach effectively filters out trivial specifications, its overall soundness still relies on the underlying model's reasoning capabilities. However, in the context of specifications, soundness is generally a more critical concern than completeness, as completeness can be relatively easily achieved by combining multiple sound specifications.
Finding "trojans" in specs is an emerging and serious concern for AI generated specs. The hackathon led to a dataset which is likely valuable beyond this competition. Some thoughts about what next: (a) the architecture is quite simple (four one-shot critics), that's fine if the benchmarking is the main aim because we need to know what works; but what about more advanced architectures? (b) is there chance to target the actual libcrux bugs?
Cite this work
@misc {
title={
(HckPrj) TrojanSpec-Bench: Adversarial Specification Elicitation in AI-Assisted Formal Verification
},
author={
Mohammad Zeeshan
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


