Mar 23, 2026
How Detectable Are Covert Signals in AI Agent Transcripts?
Mark Martirosian
If an AI agent is secretly doing something bad inside an otherwise normal-looking transcript, how hard is it for a safety monitor to catch? We built a pipeline that generates fake sysadmin transcripts where a hidden token is embedded at five different levels of concealment, then had Claude Opus 4.6 evaluate each one without being told what to look for.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) How Detectable Are Covert Signals in AI Agent Transcripts?
},
author={
Mark Martirosian
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


