Mar 23, 2026
Denoising Trajectories as Early-Warning Signals for Diffusion Language Model Control
Seunghyun Yoo
We study whether denoising trajectories in diffusion language models can serve as early warning signals for AI control. Using LLaDA with pretrained sparse autoencoders in an executable shell setting, we find trajectory-based monitoring detects attacks much earlier than output-only oversight, while steering shows weaker but preliminary promise for localized intervention.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Denoising Trajectories as Early-Warning Signals for Diffusion Language Model Control
},
author={
Seunghyun Yoo
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


