Mar 22, 2026
TraceGuard: Structured Multi-Dimensional Monitoring as a Collusion-Resistant Control Protocol
Khanh Linh Nguyen, Hoa Nghiem, Tu Tran
TraceGuard introduces structured multi-dimensional monitoring as a new control protocol for AI agents. Instead of single-score "vibes checks," we decompose agent evaluation into five dimensions (goal alignment, constraint adherence, reasoning coherence, safety awareness, action-trace consistency) scored in parallel, augmented by seven heuristic detectors and an LLM-based intent analyzer. Evaluated on BashArena (519 samples) within ControlArena, we find: (1) single-score baselines achieve 0% safety — structured scoring achieves 82-100%; (2) structured prompts constrain collusion even with untrusted monitors (0% → 95% safety); (3) separation of duties across trusted/untrusted models achieves 100% safety. All monitors are new implementations for the ControlArena ecosystem. Code: https://github.com/linhkid/TraceGuard
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) TraceGuard: Structured Multi-Dimensional Monitoring as a Collusion-Resistant Control Protocol
},
author={
Khanh Linh Nguyen, Hoa Nghiem, Tu Tran
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


