Mar 22, 2026
Multi-Turn Optimisation for Runtime Monitoring
Yuanyuan Sun, Yuting Wu
We study whether session history helps allocate runtime-monitoring budget in multi-turn interactions. Building on Combining Cost-Constrained Runtime Monitors for AI Safety, we compare single-monitor baselines, a static two-monitor threshold baseline, and a flexible session-aware baseline on SafeDialBench English conversations. Because the original monitor runner was not available, we use proxy alpha and beta scores anchored to the parent repository's monitor-score distributions.
Our main result is that the session-aware baseline appears to improve the budget-recall tradeoff in this proxy setup. On one held-out split, it matches the static baseline's best attack-turn recall (0.763) at a much lower total budget (2.840 vs 5.212). Across 10 random session splits, mean best recall remains nearly identical for the session-aware and static baselines (0.822 vs 0.823), while mean best total budget is lower for the session-aware baseline (4.090 vs 5.770). This is a multi-turn runtime-monitoring study inspired by AI control, not a full control-faithful evaluation.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Multi-Turn Optimisation for Runtime Monitoring
},
author={
Yuanyuan Sun, Yuting Wu
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


