Mar 23, 2026
Constraint Dilemma
Manu Xaviour Thaisseril Shaju, Soham Takawadekar
Trusted monitors can catch backdoors in AI-generated code, but only by flagging a large share of honest submissions for costly human review. We
propose enforced comment constraints — mandatory, structured documentation
rules that create a dilemma for adversarial models: follow the constraints,
which bind comments to actual code semantics, exposing the backdoor to the
monitor — or break them and get caught by an automated compliance checker
before the monitor even runs. Building on Sreevatsa et al. (2026), who
showed that optional comments improve trusted monitoring, we make comments
mandatory and structured, eliminating the attacker's ability to write vague
or misleading documentation. We introduce a 5-class constraint taxonomy
ranging from free AST-based presence checks to LLM-verified semantic
accuracy, and evaluate it against six adversarial commenting strategies
across 200 coding problems. Our experimental data is synthetic due to
compute limitations; the contribution is the constraint dilemma framework
and methodology, not the specific numbers
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Constraint Dilemma
},
author={
Manu Xaviour Thaisseril Shaju, Soham Takawadekar
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


