Mar 22, 2026
MonitoLR: Hunting for honesty improves safety at low audit budgets
Douw Marx
This project tests whether using a "benign-confidence" monitor, in addition to the standard suspiciousness monitor, to approximate the log-likelihood ratio test statistic improves safety monitoring. In addition to asking "how suspicious is this action?", we also ask "how benign is this?". Results on LinuxArena's Haiku trajectories show a +50% improvement (0.192-0.288) in safety at a 0.01% audit budget, compared to a baseline with the same compute budget. This result can be motivated by the Neyman-Pearson lemma, which states that the likelihood ratio test is the most powerful test for a given false positive rate.
Finding better monitoring techniques with low-auditing budgets seems important. This paper shows a relatively straightforward implementation. For the fair baseline - I would argue in order to vouch for the usefulness of the Neyman-Pearson lemma, you should average both the positive and negative prompts instead of averaging the positive prompt twice. Further, the dataset size seemed to small to make broad claims about the success of the technique.
However, this is a solid idea with straightforward implementation.
The diversity motivation is strong, clear and well thought out. I would like to see this completed.
Cite this work
@misc {
title={
(HckPrj) MonitoLR: Hunting for honesty improves safety at low audit budgets
},
author={
Douw Marx
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


