Mar 31, 2025
Model Models: Simulating a Trusted Monitor
Louis Thomson, Alex Holness-Tofts, Jasmijn Remmers, Xanthe Spence, Lennie Wells, Charlie Griffin
3rd place
Summary
model-models-simulating-a-trusted-monitor-r682
We offer initial investigations into whether the untrusted model can 'simulate' the trusted monitor: is U able to successfully guess what suspicion score T will assign in the APPS setting? We also offer a clean, modular codebase which we hope can be used to streamline future research into this question.
Cite this work:
@misc {
title={
Model Models: Simulating a Trusted Monitor
},
author={
Louis Thomson, Alex Holness-Tofts, Jasmijn Remmers, Xanthe Spence, Lennie Wells, Charlie Griffin
},
date={
3/31/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}