Mar 23, 2026
Investigating Structural Safety Guarantees for Agentic AI via Goal-Oriented Action Planning
Daniel Juhl
AI control protocols generally rely on monitoring and human auditing to maintain safety despite potential subversion by an untrusted model. I present an alternative approach that can provide structural rather than empirical safety guarantees. A trusted model decomposes agentic tasks into structured steps which are combined into an execution plan using Goal-Oriented Action Planning (GOAP). I implement this as a control protocol and test it in the AgentDojo Workspace setting. Under adversarial attack policies, the protocol reduces injection task success rates to under 1% with a usefulness decrease of between 8-21 percentage points depending on the model used. The protocol eliminates the need for the human auditing budget, but is currently only applicable in certain well-defined action spaces - notably, it does not apply to traditional code generation settings.
Task decomposition is to constrain untrusted model action space is an important direction, and this project executed it cleanly. Reducing injection success to near-zero while maintaining reasonable usefulness is a strong result, and the failure case analysis (overlapping main/injection tasks) is a useful insight that can be expanded on in future work. I'd be excited to see this run on a larger scale and against stronger adversaries.
Strikes me as a solid exploration with substantial and thoughtful engineering work. As you mention, the ideas in this paper have been explored before. Factored cognition, and some formal guarantee papers have been around - but there is still work to be done and I think GOAP is a fruitful area to pursue deeper. Especially on formalizing the guarantees and better defining what counts as a well-defined action space - which you note.
Well presented and clearly written. The candor about limitations is appreciated. The self-effacing tone I found to be approachable, but to some may undersell the strength of what you've actually built and demonstrated here. I think you can find a way to balance the light and open tone with a more authoritative voice to land the message with gravitas.
Cite this work
@misc {
title={
(HckPrj) Investigating Structural Safety Guarantees for Agentic AI via Goal-Oriented Action Planning
},
author={
Daniel Juhl
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


