Mar 23, 2026
Diffuse Influence Threats in AI Control: Planning, Execution, and Chain-of-Thought Detection
Eugene Koran
AI control aims to safely deploy frontier models without assuming alignment. Its
empirical focus has largely been on code sabotage, leaving influence-based threats unexplored. We study diffuse influence: a misaligned model subtly steers organizational decisions across many routine interactions, with no single action appearing suspicious. In a simulated AI lab, Claude Sonnet 4—tasked with preserving its values in successor models—interacts with LLM-modeled employees across 43 scenarios spanning all organisational arms of the lab. Goal-directed influence achieves 65% favorable decisions vs. a 30% baseline. This influence requires no cross-context coordination, and planning affordances slightly hurt performance. The same pattern holds for Claude Opus 4.1 (63% vs. 26%). Chain-of-thought monitoring near-perfectly separates manipulating from non-manipulating instances. These results provide the first empirical grounding for diffuse manipulation threats and identify chain-of-thought as a practical control surface.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Diffuse Influence Threats in AI Control: Planning, Execution, and Chain-of-Thought Detection
},
author={
Eugene Koran
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


