Mar 23, 2026
Bugs over Backdoors: Efficient and Evasive Sabotage via Plausible Code Bugs in LinuxArena
Timothy Obiso, Dean Cahill
AI control attack policies vary widely in efficacy and computational cost. We design and evaluate a bug-based attack policy for LinuxArena, wherein an untrusted agent introduces plausible code defects rather than obfuscated backdoors as its primary sabotage vector. Across nine samples, bug-attack achieves an 89% side-task success rate, exceeding the 78% of the standard obfuscation baseline, while using 14.5x fewer tokens. Notably, bug-attack and standard attack exhibit opposite vulnerability profiles under trusted-editing defense: trusted editing catches 37% of bug-attack sabotages while catching none of standard attack's, suggesting the two are complementary rather than substitutes. We also report single-sample results for two exploratory policies. We treat all findings as preliminary given sample size constraints, but consider the efficiency and evasion results directionally robust.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Bugs over Backdoors: Efficient and Evasive Sabotage via Plausible Code Bugs in LinuxArena
},
author={
Timothy Obiso, Dean Cahill
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


