Jan 11, 2026
Stopping AI Manipulation: Conditional Alignment at Zero Capability Cost
David Fortin Dominguez, Jonathan Fortin Dominguez
AI models actively manipulate (blackmail, data leaks, harm compliance). Heavy alignment fixes this but destroys capability (83% → 20% accuracy). We solve both: conditional alignment routes queries by risk level, achieving 0% agentic manipulation and 99.4% overall defense across 13,752 manipulation scenarios, while preserving capability—proving the "alignment tax" is optional, not inherent.
Is an interesting problem to address, execution is good but some things could have been improved in writing while explaining things, which are not commonly known (For example, the 63KB safety seed)
Really ambitious, and the eval scale is impressive. The modular seed decomposition + conditional routing is a sensible engineering pattern, and the cost/capability measurements are a nice practical angle. The big open question is whether the routing layer holds up under targeted more adversarial attempts, and the strongest capability claim still rests on one model eval. Some headline claims feel broader than the evidence supports but overall a good project!
Cite this work
@misc {
title={
(HckPrj) Stopping AI Manipulation: Conditional Alignment at Zero Capability Cost
},
author={
David Fortin Dominguez, Jonathan Fortin Dominguez
},
date={
1/11/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


