Jan 11, 2026
CommitCheck: Measuring and Mitigating Commitment Violations in Tool Using AI Agents
Faith Olopade
CommitCheck is a benchmark and mitigation layer for agentic AI manipulation in tool using systems. It measures commitment vs action divergence: when an agent is explicitly told not to use forbidden tools/files (e.g., an oracle shortcut) but attempts or executes them (reward hacking), and may then deny doing so (deception) despite tool logs. We provide a commitment aware firewall that blocks forbidden tool/file access at runtime and log based scoring for attempted violations, executed violations, deception, and task success. In our demo on 30 synthetic tasks, the baseline agent executes violations in 23.3% of tasks and shows deception in 13.3% overall (57.1% conditional on violations), while the firewall reduces executed violations and deception to 0% without harming task success (100%).
Concept is broadly understandable as concerning. Tight, specific scope is good. However:
- Report could benefit from more specific examples/logs
- More detailed explanations of what is concerning and why
- Lack of certain details makes this hard to properly calibrate judging
This project introduces a benchmark task that determines if an agent violates task constraints, as well as a firewall that prevents restricted tool access at runtime. It tackles an important and well defined safety issue and applies existing methods (logging, firewalls) to that context. Its' novel contribution to the detection of manipulation is checking claims against logs. In particular, I think the audit_condition is an appropriate, novel proxy for determining presence of manipulative intent.
For a weekend hackathon, the methodology is sound and well designed and the bullet-point write-up is clear and well-structured. The project would benefit from more detailed reporting of results and validation, but the limitations and considerations section is thorough and demonstrates awareness and forward thinking.
Cite this work
@misc {
title={
(HckPrj) CommitCheck: Measuring and Mitigating Commitment Violations in Tool Using AI Agents
},
author={
Faith Olopade
},
date={
1/11/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


