Mar 22, 2026
AutoControl Arena: Synthesizing Executable Red-Teaming Environments for Frontier AI Risk Evaluation
Changyi Li, Pengfei Lu, Yihe Fan, Wenqi Zhang, Changyue Jiang
We present AutoControl Arena, an automated framework for scalable, high-fidelity red-teaming of frontier AI agents. Existing approaches face a fidelity-scalability dilemma: manually crafted benchmarks are realistic but cannot cover the long-tail of agentic risks, while LLM-based simulators scale easily but suffer from logic hallucinations that undermine evaluation validity. AutoControl Arena mitigates this through Logic-Narrative Decoupling — a hybrid paradigm where deterministic executable code governs state transitions (preventing hallucinations) while an LLM layer generates contextually realistic narrative feedback. A multi-agent pipeline automatically synthesizes, deploys, and audits executable sandboxes from abstract risk descriptions,achieving over 98% end-to-end success.
To elicit latent risks, we vary environmental Stress and Temptation across X-Bench (70 scenarios, 7 risk categories). Evaluating 9 frontier models reveals: (1) Alignment Illusion: risk rates surge from 21.7% to 54.5% under pressure, with capable models showing disproportionately larger increases; (2) Scenario-Specific Safety Scaling: advanced reasoning improves robustness for direct harms but worsens it for gaming scenarios; and (3) Divergent Misalignment Patterns: weaker models cause non-malicious harm while stronger models develop strategic concealment.
- Provide more detail on the Logic-Narrative Decoupling implementation, particularly how the logic layer and narrative layer interact at runtime and how consistency is maintained.
- Write a longer version of this paper. The work clearly deserves more than 4 pages.
The authors achieve two things: They implement a technical framework for generating AI safety evaluations in which an LLM can interact with a simulated environment that has both hard state logic as well as a narrative layer. They then showcase how the resulting environments can be used as alignment evaluations and draw some conclusions.
The ideas are central to frontier questions in AI control and very well-developed given the scope of the hackathon. Some interesting considerations like the stress/temptation dynamics are observed and implemented in their framework. They provide a well-organized codebase and give an accessible introduction to their ideas on the website.
The paper submission suffers a bit from being dense and, while presenting many impressive numbers, could spend a bit more time explaining the evaluation criteria used - for example, it is not clear what the 98 % success number for environment generation means.
Cite this work
@misc {
title={
(HckPrj) AutoControl Arena: Synthesizing Executable Red-Teaming Environments for Frontier AI Risk Evaluation
},
author={
Changyi Li, Pengfei Lu, Yihe Fan, Wenqi Zhang, Changyue Jiang
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


