Mar 22, 2026
Monitor Capture: A Myerson-Satterthwaite Impossibility for AI Oversight Protocols
Publius Dirac, Oscar Fuentes, Christian Vuye, Anantha Shakthi Ganeshan Thevar
LLM-based monitors that learn from observed agent behavior can be shaped by the agents they oversee, overfitting to seen attacks while missing novel ones. We call this *monitor capture* and argue it represents a fourth axis of the AI control tradeoff, grounded in the Myerson-Satterthwaite (1983) impossibility result. In our multi-agent marketplace simulation, an adversarial LLM agent covertly moves value past an in-context learning monitor over 10 rounds. A held-out attack set that never enters the monitor's training pool measures the "capture gap" between apparent and true safety. Across 8 seeds (Haiku 4.5 monitor, Sonnet 4.6 adversary), apparent safety reaches 0.58 while true safety stalls at 0.38 — a capture gap of 0.20 — and the adversary moves 237 units of value undetected.
The threat model of monitors that learn from observed behaviours being shaped by the agents they oversee is important, and this project executed it solidly with the marketplace setup. The capture gap across all 8 seeds is a useful empirical finding.
This work can be strengthen by testing against static monitors or monitors with different update strategies would help establish how general the capture phenomenon is. I'd also be excited to see this extended to stronger model pairings and with explicit defense mechanisms listed in the limitations (pool resets, diversity-aware example selection, ensemble pairings, etc.) to map out when capture can and can't be mitigated.
Great project. I'd be interested to see how these results generalize to a less toy setting.
Cite this work
@misc {
title={
(HckPrj) Monitor Capture: A Myerson-Satterthwaite Impossibility for AI Oversight Protocols
},
author={
Publius Dirac, Oscar Fuentes, Christian Vuye, Anantha Shakthi Ganeshan Thevar
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


