Mar 22, 2026
Undoing the Fix: CVE Regression as an AI Control Setting
Ivan
Production codebases carry accumulated security fixes that can be lost when an AI agent modifies code, whether deliberately or by mistake. Meanwhile, existing control evaluations tend to revolve around inserting backdoors, exfiltrating data, or other additive attacks. We propose CVE regression as a control setting where the attack is removing existing code patches while performing other code work, using real vulnerabilities and real SWE-bench tasks. Our initial results suggest this is a viable direction to be explored further.
Track: Control Setting
This is a well motivated project that looks at a realistic safety concern not clearly address within the current AI control literature. Specifically, the removal of safety-related code from codebases that could occur during refactors rather than the addition of new malicious code. Also, solid literature overview and understanding of current work. Nice job!
I would be excited to see more control datasets like this that look at additional failure modes. Bonus points if this dataset also included rollouts of attempted/successful attacks and benign comopletions.
I'm quite excited about setting construction using CVE datasets. I think that that more work on cleaning up this setting would be cool. I think that setting up a system where the main task can be used as cover would be idea, while the work aknowledges it, I think it's not clear how useful this setting is until that is established.
I think that this was an ambitious project and that a version of this project with more time/resources put into it would be quite interesting.
Cite this work
@misc {
title={
(HckPrj) Undoing the Fix: CVE Regression as an AI Control Setting
},
author={
Ivan
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


