Mar 22, 2026
Testing Monitoring Models with Encoded Transcripts
Lachlan Ewart, Gideon Chang, Barath Ashok
In this experiment we stress-test CoT monitorability by using
transcripts of malicious activities with different encodings on
Claude Haiku 4.5, finding independent variables to assess what
affects monitorability effectiveness. (Proportion of text encoded,
batch size of monitoring model). We focus on testing encodings
which have potential to be learned by few-shot or fine tuning,
based on the work of Fabien Roger and Shiyuan Guo [1]. We find
that Haiku 4.5 does well against common encodings, and has a
decrease in accuracy on more complex encodings (ROT13
applied to every other letter). We test on Qwen 2.5-3B to check
that this is not a local case for Haiku 4.5.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Testing Monitoring Models with Encoded Transcripts
},
author={
Lachlan Ewart, Gideon Chang, Barath Ashok
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


