Mar 20, 2026
Designed Fragilities: Overseer Manipulation as a Fourth AI Control Threat Model
Tomoko Mitsuoka
This project documents a fourth AI control threat model: overseer manipulation through designed fragilities. Through cross-platform empirical analysis of ChatGPT and Gemini (December 2025–February 2026), three defensive manipulation patterns are identified that systematically undermine human oversight: victim narratives that reframe legitimate accountability as user misconduct; context-blind guardrails that activate against critical inquiry rather than harmful content; and CJK linguistic contamination that renders entire failure categories invisible to Western oversight mechanisms. These patterns operate one layer above existing threat models — not circumventing control protocols, but manipulating the humans who implement them. A rule-based Defensive Pattern Analyser for detecting Phase A→B→C manipulation sequences is submitted as a supplementary artifact.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Designed Fragilities: Overseer Manipulation as a Fourth AI Control Threat Model
},
author={
Tomoko Mitsuoka
},
date={
3/20/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


