Grounding LLMs With Standards Documents
Chad Brewbaker, Suhaas Teja V.
We grounded LLMs with POSIX, WC3, etc standards documents and the LEAN4 source code and in the course of the weekend they are able to elicit several bugs in the Rust MIT Coreutils and Safari/Webkit codebases.
The paper presents an experience report of providing LLM agents with authoritative specification documents (RFC, IETF etc.) and asking agents to reimplement tools with formal specifications matching these documents. While this approach is not itself strictly new (most larger verification efforts have been partially driven using existing specs), the results of this experiment are a useful datapoint in shaping the space of securing software using llms.
A couple of comments regarding the results:
- the paper mentions several specification bugs, such as "BUG-SS-001" a bug in RFC 8941, regarding SameSite values. When digging into this, it seems RFC 8941 is Structured Field Values for HTTP (https://datatracker.ietf.org/doc/html/rfc8941), and the rfc does not contain any mention to samesite. Similar observation for the FTP one. It could be that I misunderstood or found the wrong spec, but this ambiguity does not help build trust in the results of the formalisation. This could have been helped with more rigorous vetting or links for the bugs.
- the second comment would be that it seems like a common pain point was around the parser, and the challenge unifying ABNF grammars with lean's functions? I would suggest the authors look into parser combinators. In particular, Lean happens to have a rather robust and flexible parser combinator library built in (this is actually what is used to parse lean code itself), and the library allows writing parsers closer to ABNF form. As a separate aside, there is a rich literature on formally verified parsers which it might be helpful to adapt for this purpose.
Would be useful to distinguish how trivial and non trivial the bugs are. Also, would be better to lead with saying how many proofs are complete instead of how many theorems are written.
Cite this work
@misc {
title={
(HckPrj) Grounding LLMs With Standards Documents
},
author={
Chad Brewbaker, Suhaas Teja V.
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


