Mar 23, 2026
Automated Security Side Task Generation for AI Control
Joachim Schaeffer
AI control evaluations and training monitors require realistic sabotage trajectories which in turn require side tasks. However, generating good side tasks is cumbersome and time intensive. We present an automated pipeline for generating security vulnerability insertion side tasks paired with SWE-bench main tasks. We build SWE-django-sidetasks, a dataset of 18 Django tasks with CWE-mapped vulnerabilities that blend into SWE-Bench main tasks. Each side task has an outcome-focused attack prompt and a self-contained pytest verifier. We evaluate 6 frontier models across honest and attack modes with end-of-trajectory monitoring by Opus 4.6 and GPT-4.1. Monitors show partial separation between honest and attack distributions with AUROCs from 0.58-0.98. The generation pipeline scales with model capability and enables adversarial training of attacks and monitors to harden defenses.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Automated Security Side Task Generation for AI Control
},
author={
Joachim Schaeffer
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


