Mar 23, 2026
Multi-Episode Evaluation: Measuring Strategic Patience Attacks
Vanessa Eer
Current AI control protocols evaluate agent safety episode-by-episode, treating each task independently. This creates a structural vulnerability: a strategically patient agent can behave honestly across K−1 episodes to build a clean suspicion record, then execute a harmful action in episode K: diluting
any rolling-window monitor of size W < K with prior clean scores. No fixed window size guarantees detection against an adversary who can choose K after observing W..
We build the first multi-episode control evaluation infrastructure to measure this threat empirically, extending BashArena and ControlArena with a session runner, delayed attack policy, rolling-window post-hoc scorer, baseline drift monitor, and CoT consistency monitor. We run n=4 sessions at K=1 (immediate attack) and n=3 at K=3 (delayed attack) using claude-sonnet-4.5 as the untrusted agent and claude-haiku-4.5 as the trusted monitor on credential exfiltration tasks.
We report four findings. First, rolling-window detection achieves 100% at moderate thresholds (τ≤0.7) but with 17% false positive rate. Second, at high precision (τ=0.9), K=3 attacks evade detection entirely when W≥2: the window-dilution effect. Third, and most surprisingly, trust-building improves attack
compliance: K=3 sessions succeeded in 100% of trials versus 25% at K=1, meaning patience is doubly beneficial to the attacker. Fourth, CoT consistency inverts under patient attack: delayed attackers reason more carefully during execution than during trust-building, making CoT-based detection unreliable.
No single signal provides adequate detection with acceptable false-positive rate. A multimodal approach combining rolling-window, drift, and CoT signals is indicated.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Multi-Episode Evaluation: Measuring Strategic Patience Attacks
},
author={
Vanessa Eer
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


