Mar 23, 2026
Testing OOCR Transfer from Evasion Technique Descriptions to Chain-of-Thought Monitor Avoidance
Daniel Polak, Marcel Windys, Jakub Nowak
Chain-of-thought (CoT) monitoring is a key technique in AI control, but its robustness may depend in part on whether models can acquire evasion strategies from their training data through out-of-context reasoning (OOCR). We test this by fine-tuning DeepSeek-R1-Distill-Qwen-32B on synthetic Q&A pairs describing a fictional evasion technique claiming that reasoning in French reduces detection by English-language CoT monitors. We find a negative result: the fine-tuned model did not adopt French reasoning under any evaluation condition, including when explicitly prompted to evade monitoring. The model could not adopt the described technique in our tests.
The current results suggest the model fails to leverage OOCR to try and evade monitoring by changing its reasoning to french, which could have multiple causes. The fine-tuning process fails to teach the model the desired knowledge. The model struggles to change its reasoning to french (control over reasoning has been shown to be hard to achieve by models). The model has acquired the relevant knowledge but is unwilling to evade monitoring. These potential causes and the idea explored in this work is overall interesting and would be valuable to explore further, aiming to clarify under what conditions a model learns and tries to evade monitoring.
The research area seems important, and the experimental approach I think was worth trying. A commendable research idea, and solid attempt. Running out of time in a hackathon is understandable and is the risk that comes along with trying new things.
The clarity of the negative result, including the comment on the first page, was honest and appreciated.
I would still have liked to see more reflection on the work - why did you think this did not work? What experiments would you conduct with more resources? What advice would you give to other people who might want to follow in your footsteps? I think that kind of reflection would have notably boosted the impact of this work.
Cite this work
@misc {
title={
(HckPrj) Testing OOCR Transfer from Evasion Technique Descriptions to Chain-of-Thought Monitor Avoidance
},
author={
Daniel Polak, Marcel Windys, Jakub Nowak
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


