HAZE: Adversarial Multi-Agent Scrutiny for Vulnerability Detection
ALEX DANIEL NOVELO FUENTES
HAZE (Hallucination-Aware Zero-sum Examination) is an AI safety project that tackles a failure mode in AI-assisted security auditing: single-agent LLM auditors tend to manufacture findings, flagging safe code as vulnerable. HAZE reframes the task from "is there a bug?" to "did this accusation survive refutation?" using a multi-agent debate protocol—two attacker and two defender models argue over a code artifact while an isolated judge rules only on what withstands cross-examination.
Grounded in AI safety principles (debate as scalable oversight and AI-control-style distrust of any single LLM pass), it was measured on a 10-CVE benchmark: a single-agent baseline flagged 80% of patched code as vulnerable, while HAZE cut that false-positive rate to 20% while keeping 80% recall—a 4× reduction in hallucinated findings, at roughly USD 0.02 per verdict. Built and reproducible, it shows how adversarial multi-agent scrutiny can make AI-driven security pipelines more trustworthy.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) HAZE: Adversarial Multi-Agent Scrutiny for Vulnerability Detection
},
author={
ALEX DANIEL NOVELO FUENTES
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


