Mar 23, 2026
Detecting LLM Subversion in Vulnerability Patching Settings
Bary Levy
LLMs are increasingly used to propose fixes to vulnerabilities in code. If the LLM is misaligned or untrustworthy, it may propose fixes that seem to fix a vulnerability but leave the core issue unresolved in a subtle but hard-to-detect way, allowing for quick re-exploitation by an adversary or by the model itself. We introduce Vul4Control: a ControlArena setting for control experiments, based on the Vul4C benchmark, where we can test control protocols to detect this kind of malicious behavior.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Detecting LLM Subversion in Vulnerability Patching Settings
},
author={
Bary Levy
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


