Feb 2, 2026
Self-Governance Under Revision
Daniel Polak, Ajay Agarwal
Frontier AI labs have published voluntary safety frameworks committing to evaluate dangerous capabilities and implement safeguards before deployment. These documents, including Anthropic's Responsible Scaling Policy, OpenAI's Preparedness Framework, and Google DeepMind's Frontier Safety Framework, are often cited as evidence of responsible self-governance, yet can be revised at any time without external approval.
We developed a taxonomy of commitment changes and applied it to over 60 revisions across three major labs, coding direction (strengthening, weakening, neutral), mechanism, and changelog disclosure. We find sharp divergence: OpenAI weakened 19 commitments with zero strengthenings; Anthropic was roughly balanced (11 weakenings, 10 strengthenings); and DeepMind net strengthened (4 weakenings, 11 strengthenings). Strengthenings were twice as likely to be omitted from changelogs as weakenings (68% vs. 38%), and all three labs weakened evaluation-frequency commitments. Most weakenings reduced oversight-relevant properties: 68% affected external accountability and 56% reduced measurability.
Overall, framework evolution varies substantially by lab, while official communications systematically underreport commitment reductions. Our taxonomy and dataset enable ongoing monitoring and inform policymakers on which commitments may require regulatory protection rather than voluntary maintenance.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Self-Governance Under Revision
},
author={
Daniel Polak, Ajay Agarwal
},
date={
2/2/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


