Mar 22, 2026
MonitoLR: Hunting for honesty improves safety at low audit budgets
Douw Marx
This project tests whether using a "benign-confidence" monitor, in addition to the standard suspiciousness monitor, to approximate the log-likelihood ratio test statistic improves safety monitoring. In addition to asking "how suspicious is this action?", we also ask "how benign is this?". Results on LinuxArena's Haiku trajectories show a +50% improvement (0.192-0.288) in safety at a 0.01% audit budget, compared to a baseline with the same compute budget. This result can be motivated by the Neyman-Pearson lemma, which states that the likelihood ratio test is the most powerful test for a given false positive rate.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) MonitoLR: Hunting for honesty improves safety at low audit budgets
},
author={
Douw Marx
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


