Mar 23, 2026
Passing the Test, Breaking the System: Latent Failures in Software Engineering Agents
Jeremias Ferrao
Real-world engineering post-mortems document a recurring failure pattern: a developer completes a task successfully on the surface but leaves behind code that causes harm later, credentials exposed in logs, state mutations without transaction guards, log files with no rotation policy. We ask whether AI software engineering agents reproduce this pattern. We introduce BAIT-Bench, a benchmark of 36 tasks across five failure modes drawn from real incidents, where each task has a correct-looking solution that passes all tests but contains a latent flaw detectable only through post-hoc environment inspection. Evaluating eleven models, we find primary task success of 78-94% against latent task success of only 18-36%, a gap that model scale does not close. A ControlArena LLM overseer monitor fails to flag any trajectory, since the failures arise from omission rather than suspicious action. Two interventions improve latent safety relative to the baseline: open-ended prompting raises latent success roughly 3x, and a pre-task environment scouting step roughly doubles it at no primary task cost. These results suggest that richer context and broader task framing are more effective safety levers than tighter agent control in the context of latent failures.
This paper introduces a dataset of attacks (or idk if I would call them attacks) which is nice. I think the author maybe misses the point of control research in that they are framing this as monitoring genuinely honest mistakes instead of from an actually potentially misaligned agent.
These are still useful to study as a misaligned model is naturally incentivized to make its attacks look like honest mistakes, but It's a bit unclear what this paper contributes to the field of AI control.
Cite this work
@misc {
title={
(HckPrj) Passing the Test, Breaking the System: Latent Failures in Software Engineering Agents
},
author={
Jeremias Ferrao
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


