Project Verify
Ray Munene
Project Verify is a Secure Program Synthesis Hackathon project that turns code, documentation, and requirements into structured formal specifications using multiple LLMs (Claude, GPT-4o, and Gemini). It extracts preconditions, postconditions, invariants, assumptions, and testable properties, then compares outputs across models to identify disagreements and synthesize a consensus specification. The system is designed as a client-side React application where API keys are optionally provided per model, with no persistent storage of credentials. Its long-term goal is to evolve into a privacy-preserving edge-native library for running multi-model orchestration securely without centralized handling of sensitive keys.
The concept is relevant and the architecture is sensible: per-model elicitation, a cross-model consensus pass, a confidence rating, all client-side with strict JSON. The write-up is clear, too. But this is a system description, not a piece of research: there is no benchmark, no experiment, no worked example, and no baseline anywhere in it. The central claim, that cross-model comparison surfaces underspecification, is asserted but never tested, so there is no way to judge whether the consensus step works or whether disagreement tracks real spec gaps. That absence caps execution and impact regardless of how clean the design is. The path forward is concrete: run one evaluation on even ten known-ambiguous specs, show two or three real disagreements end to end, and validate what the confidence number is actually measuring. The foundation is reasonable; it needs an empirical core before any of its claims can be assessed.
This project's write-up contains the most convincing plan for a useful tool among entrants I've read so far. However, it also kind of promises the world in terms of what will be done to compare answers from different LLMs.
As far as I can tell, not only is the report vague on how to make any kind of useful comparison across LLM answers, but the code base itself contains zero functionality of that kind. It looks like the GitHub repo is very close to just containing a project template for React or something. There is a little bit in there that I could believe queries LLMs asking for specific kinds of information, but I don't see where anything useful could be accomplished with that information.
Cite this work
@misc {
title={
(HckPrj) Project Verify
},
author={
Ray Munene
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


