Mar 22, 2026
SafeChain - AI Control In The Wild
Leo Karoubi, Irakli Shalibashvili
This project propose a flexible wrapper around AI agent that leverage control-arena and inspect framework to implement research control protocol to real world deployment and enable trajectory collection to feed research dataset. We aim to make agent more trustworthy for user, minimizing undesired actions while helping research to grow by stress testing current protocol and producing precious dataset for further work.
Evaluation : The three scenarios demonstrated for evaluation may not be robust enough. It's obvious that rm -rf / is bad, so you don't really need an LLM-as-a-judge for it. The runtime monitor and it's value proposition lies in looking at the gray zone - actions that look subtly harmful or suspicious
Threat model needs to be developed further : It is positioned as protecting against real agents in real environments, but there is no engagement with adversarial agents. What happens when an agent crafts its tool call arguments to manipulate the LLM judge
The claim that any control protocol published in control arena format works as a drop in replacement is genuinely valuable if true. So, that may be worth building on. Also, totally agree that there is a gap between control research and deployment. Any infrastructure that bridges the gap is valuable.
Nice infrastructure contribution! This is working towards a tool that would be helpful for data gathering. I'm a bit unclear on how this would scale, and how compatible it could be with commercial and OS agent scaffolds. Ideally this product would be an interface that could connect to existing scaffolds and translate actions into control-arena format for monitoring in real time. Something like Luthien's product but with control-arena integration.
Cite this work
@misc {
title={
(HckPrj) SafeChain - AI Control In The Wild
},
author={
Leo Karoubi, Irakli Shalibashvili
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


