Mar 23, 2026
Cross-Examine: Interrogation protocol for AI Control
Ritika Verma
Cross-Examine is an interrogation protocol for AI control. When a passive monitor flags an AI agent's action as ambiguous, a trusted model (GPT-4.1 mini) cross-examines the untrusted model (Claude Sonnet) using a 5-category question taxonomy to expose deception. Through 4 iterations, it achieves 100% automation (zero human reviews) while maintaining 93% safety and 100% usefulness. Key discovery: naive interrogation hurts safety — the suspicion floor mechanism fixes this.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Cross-Examine: Interrogation protocol for AI Control
},
author={
Ritika Verma
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


