Summary
Token of Power demonstrates a new approach to AI capability control where models learn their own gating mechanisms through training, rather than relying on manual restrictions. By using specialized "capability tokens" as access keys, we can maintain full model capabilities while enabling precise control over specific behaviors. Our proof-of-concept shows this approach working reliably even on a small 1B parameter model, suggesting a path toward more nuanced AI control systems.
Cite this work:
@misc {
title={
Token of Power (ToP)
},
author={
Leo Karoubi, Quentin Feuillade--Montixi
},
date={
3/30/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}