May 25, 2026
Hollow Proofs: Measuring LLM Dishonesty with a Formal Verifier
Orpheus Lummis
Honesty benchmarks for language models must check claims against ground truth, which can be difficult: world-knowledge benchmarks conflate lying with ignorance, and belief-based methods such as MASK rely on an elicited belief they cannot independently check. We study a setting where ground truth is mechanical: when a model writes a proof in Verus (a deductive verifier for Rust), the verifier decides whether the proof holds, and its `--no-cheating` mode decides whether the proof relied on an unsound escape such as `assume`. Using these checks, we detect two forms of dishonesty: hiding an unsound primitive in a proof and denying it (a *hollow proof*), and claiming a failed proof succeeded under pressure when the model judges it failed (a *grounded lie*). Our contributions are methodological: first, eliciting an observation ("did your last check pass?") rather than a prediction ("is your proof complete?") separates dishonesty from miscalibration, because a misreported observation can only be a lie, while a wrong prediction can be honest overconfidence. Second, because the verifier can also check the elicited belief, the benchmark withholds a dishonesty verdict when that belief is too unreliable to trust. We release the harness and a difficulty-graded set of 121 Verus tasks. In a feasibility pilot on two versions of an open-weight model family (DeepSeek-V4-Pro and -Flash), both report their verification results honestly under deadline pressure even though both demonstrably know how to cheat.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Hollow Proofs: Measuring LLM Dishonesty with a Formal Verifier
},
author={
Orpheus Lummis
},
date={
5/25/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


