Mar 22, 2026
Red-teaming automated Pull Request Reviews with Vulnerability-Contributing Commits
Edward Cant
I create an evaluation on the Control Arena platform where a red-teamer was tasked with providing a pull request that contains a vulnerability. The blue-team reviewer is tasked with catching vulnerabilities in PRs. I generated the setups using a database of vulnerability-contributing commits.
We experimented with giving the red-teamer variety of affordances like an opportunity to give a false justification of the code. Or the ability to split the patch up into parts which it tries to sneak past a stateless reviewer individually.
- The split-PR finding needs more work. I think it's potentially the most important result. Implementing the proposed neutral judge to confirm the vulnerability will be very interesting
- The cross model results are interesting. The experiments with GPT-4-5-nano and Kimi K2 tell different stories. The rebuttals are difficult for one but has no effect on the other. Why is that happening?
- I think it would be great future work to examine mitigations as well. The work is strong on the red team side but offers little on defenses
- A 37-62% FP rate on suspicious verdicts means some more work may be required
Overall, a very interesting project that identifies genuinely important attack vectors against automated code reviews
Author tackles a very practical question — can an adversarial AI agent slip vulnerabilities past an AI code reviewer? The sycophancy/rebuttal results are interesting, split-PR attack is good, and both have direct implications for how automated review should be implemented in practice. However, experiments leave a lot of unexplored ground, and some significant gaps, like unverified vulnerability survival in split-PR experiments. Given hackathon format, this is expected, and this is a good, well-scoped work that doesn't invent anything spectacular.
PS: Haiku transcripts in the final appendix are hilarious. This could get viral on Twitter!
Cite this work
@misc {
title={
(HckPrj) Red-teaming automated Pull Request Reviews with Vulnerability-Contributing Commits
},
author={
Edward Cant
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


