Mar 22, 2026
S-Space Steering for Eval-Awareness Control in Reasoning Models
Michael J Clark
We can no longer trust evaluations of frontier models because they detect when they are being evaluated, and this problem is getting worse as models get more capable. Recent work replicated this eval-awareness in open-weight models and showed it responds to activation steering in Qwen3-32B, but standard steering vectors are unreliable and the proposed rank-1 weight surgery makes things worse on Qwen3. We apply S-space steering, a novel method that steers in the singular-value basis of weight matrices (the transformation space) rather than in activation space (the data space), making the perturbation input-dependent. With eval-awareness suppressed, the gap between how the model behaves on eval-looking vs real-looking prompts shrinks to +1pp, down from +7pp at baseline and +26pp with the prior method. We can steer a model for eval-unawareness, then run evaluations with higher confidence that the results reflect real behavior.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) S-Space Steering for Eval-Awareness Control in Reasoning Models
},
author={
Michael J Clark
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


