Mar 23, 2026
Investigating Structural Safety Guarantees for Agentic AI via Goal-Oriented Action Planning
Daniel Juhl
AI control protocols generally rely on monitoring and human auditing to maintain safety despite potential subversion by an untrusted model. I present an alternative approach that can provide structural rather than empirical safety guarantees. A trusted model decomposes agentic tasks into structured steps which are combined into an execution plan using Goal-Oriented Action Planning (GOAP). I implement this as a control protocol and test it in the AgentDojo Workspace setting. Under adversarial attack policies, the protocol reduces injection task success rates to under 1% with a usefulness decrease of between 8-21 percentage points depending on the model used. The protocol eliminates the need for the human auditing budget, but is currently only applicable in certain well-defined action spaces - notably, it does not apply to traditional code generation settings.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Investigating Structural Safety Guarantees for Agentic AI via Goal-Oriented Action Planning
},
author={
Daniel Juhl
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


