Mar 23, 2026
Testing OOCR Transfer from Evasion Technique Descriptions to Chain-of-Thought Monitor Avoidance
Daniel Polak, Marcel Windys, Jakub Nowak
Chain-of-thought (CoT) monitoring is a key technique in AI control, but its robustness may depend in part on whether models can acquire evasion strategies from their training data through out-of-context reasoning (OOCR). We test this by fine-tuning DeepSeek-R1-Distill-Qwen-32B on synthetic Q&A pairs describing a fictional evasion technique claiming that reasoning in French reduces detection by English-language CoT monitors. We find a negative result: the fine-tuned model did not adopt French reasoning under any evaluation condition, including when explicitly prompted to evade monitoring. The model could not adopt the described technique in our tests.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Testing OOCR Transfer from Evasion Technique Descriptions to Chain-of-Thought Monitor Avoidance
},
author={
Daniel Polak, Marcel Windys, Jakub Nowak
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


