Spec-Laundering
Revathi Prasad
A Benchmark and Severity Measure for Adversarial Specification Cheating in Dafny
The core idea, “spec laundering,” is a useful and memorable framing for an important secure program synthesis failure mode: a verifier can certify code against a specification that has been deliberately weakened to admit a backdoored implementation. That threat model feels especially relevant as LLMs increasingly generate both code and specifications.
The strongest parts of the work are the clear threat model, the concrete Dafny attack catalog, the AWS Digest case study, and the honesty about limitations. The project does not overstate the detector as a finished classifier. It reports threshold tradeoffs, false positives, construction overfit, and the small adversarial sample size. The structural argument about why postcondition mutation testing cannot detect axiom-based cheating is also a valuable contribution, because it explains why an orthogonal scan for assume, {:axiom}, and related constructs is needed.
The detector itself is promising but still early. The six-attack benchmark is small, and five of the attacks were designed around patterns the detector explicitly checks. The non-adversarial DafnyBench comparison is helpful, but the overlap between adversarial and non-adversarial severity scores shows that calibration is not solved yet. The false positives on honest single-clause equality specs are also important, because that pattern is common and could create review fatigue if used directly in practice.
For the next version, I would focus on three things. First, expand the adversarial benchmark with attacks designed by someone who knows the detector but is trying to evade it. Second, build a labeled calibration set of Dafny specifications so the severity score can be tuned against real tight/loose/ambiguous cases. Third, run the detector on a larger real Dafny codebase, ideally the full AWS Dafny library, and manually inspect the flags.
Overall, this is a well-presented, security-relevant project with a clear theory of impact. It is not yet a production-ready detector, but it names a real failure mode, builds a reproducible benchmark, and gives a concrete starting point for future work on adversarial specification validation.
Good work. I like this angle of mutating specs and detecting these mutations. This work could be a bit more grounded in concrete attack scenarios.
Cite this work
@misc {
title={
(HckPrj) Spec-Laundering
},
author={
Revathi Prasad
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


