Adaptive Permission Sandbox for LLM Agents
Shaheen Raja
LLM agents are powerful but unsafe by default, often executing harmful or unauthorized actions such as destructive SQL queries or leaking sensitive data. My project introduces an Adaptive Permission Sandbox with a human‑readable DSL that enforces safety rules proactively, before execution.
Policies cover static restrictions (e.g., deny DROP TABLE), stateful requirements (e.g., require schema_lookup before SELECT), data‑flow protections, and runtime guards. Each rule is compiled into monitorable formulas and formally verified, ensuring provable guarantees and auditability. By blocking unsafe behavior at the source, this system delivers adversarial robustness, compliance readiness, and a practical path toward trustworthy AI agent deployment.
Strengths. This is a clearly written, well-structured submission that frames a real and important problem — proactive enforcement at the tool-call boundary for LLM agents — in a way that is accessible to security engineers without sacrificing rigor. The "AWS IAM for AI agents, formally verified" framing is genuinely useful: it maps onto existing operator intuitions while pointing at the additional axes (temporal sequencing, data-flow labels, runtime resource limits) where agent permissions need to go beyond classical IAM. The four-category taxonomy — static, stateful, data-flow, and runtime — covers an orthogonal and useful space. The control-plane vs. data-plane separation argument for adversarial robustness is the strongest conceptual contribution here: because the monitor operates on structured tool-call metadata rather than the model's natural-language reasoning, injected text in tool outputs cannot reach the enforcement decision surface. This is a meaningfully stronger story than content filtering and worth foregrounding even further. The DSL example, the worked enforcement trace in Listing 2, the architecture description in Section 4.1, and the terminal screenshot on slide 5 showing a DROP blocked while a SELECT succeeds together make the basic idea concrete and inspectable.
Opportunities to strengthen. The biggest gap is an inconsistency between what the paper claims and what the slides describe as actually built, and more generally between the formal claims and the evidence provided. A few places where this could be tightened, in roughly descending order of importance:
1. The paper and slides disagree on what is delivered. The paper presents Lean 4 proof-carrying compilation and Z3 satisfiability checks as part of the current pipeline (e.g., abstract: "compiled into monitorable LTLf formulas, and formally verified with Lean/Z3"; Section 6.1: "Every compiled policy emits a Lean 4 proof obligation..."). The slide deck's "Next Steps" panel, however, lists "Integrate formal verification (Lean/Z3)" as future work. This is a load-bearing inconsistency: the formal verification angle is the submission's most novel and differentiated claim, and the slides effectively acknowledge it is aspirational. Two clean ways to resolve this: either (a) build a minimal Lean proof for one compiled policy and link to it, or (b) reframe the paper's abstract and Section 6.1 so that the verification pipeline is clearly labeled as planned rather than delivered. The current mismatch will undermine reader trust in the rest of the claims.
2. Empirical evidence is largely absent from the paper. Section 5 is explicitly titled "Evaluation Plan" and contains targets ("we target sub-millisecond overhead") rather than measurements. The latency numbers cited ("under 0.1 ms" for static rule evaluation, "under 0.5 ms" for stateful monitor advancement on traces up to 1,000 events) are described as "initial profiling on synthetic workloads," but no methodology, hardware, workload definition, or distribution of results is reported. The three adversarial scenario suites in Table 3 (SQL Destruction, File Exfiltration, Secret Leakage) are described but not run — there are no block rates, no detection rates, no false-positive numbers. The slide-5 screenshot is the only piece of evidence that any of this runs, and it covers only the basic SQL allow/deny case. Even a small results table on a handful of scenarios would substantially raise credibility.
3. The data-flow tracker is the hardest piece and gets the least treatment. Section 4.1 describes it in one sentence — "propagates taint labels through tool outputs and checks flow rules before any data leaves the sandbox" — and the slide-5 DSL example notably omits the flow keyword that's in the paper's Listing 1, suggesting this component may also be aspirational in the current prototype. Information-flow control is notoriously subtle: implicit flows, label transformations under aggregation, joins on tainted data, and declassification policies are all genuinely hard. It would strengthen the submission to acknowledge this complexity explicitly and describe what the prototype actually handles versus what it punts to future work.
4. A public artifact would help a lot. There is no GitHub link, no implementation source for any of the five components, and Listing 2 is presented as an illustrative trace rather than real captured output. A minimal repo containing the DSL parser, the SQL-agent prototype that produced the slide-5 screenshot, and (when integrated) the Lean policy file would let reviewers verify the work end-to-end.
5. Clarify Z3's role. The abstract says policies are "formally verified with Lean/Z3," which suggests parity between the two tools. Section 6.1 reveals Z3 is used only for compile-time satisfiability and contradiction checks, not for the core LTLf-to-DFA correctness proof (which is Lean's job). Tightening this distinction would help readers calibrate the depth of the verification story.
6. Comparison to existing solutions is missing. Related work covers IAM, specification mining (Chen et al.), BabelBench, and formal LTLf monitoring tools (MONA, Spot), but does not engage with the practical agent-guardrail ecosystem already shipped by industry. A short paragraph contrasting the formal-methods angle against these — including the data plane / control plane argument framed as a concrete head-to-head example — would help position the contribution.
7. Citations need a proper bibliography. "Chen et al. (2024)," "von Hippel et al.," and "BabelBench (2024)" are referenced in the body, but there is no References section anywhere in the document. Adding a bibliography with venues and links would let readers follow the threads.
8. Residual attack surface inside permitted actions deserves more attention. A SELECT that exfiltrates the right rows, or a benign-looking sequence that composes into harm, remains in scope even with the static and stateful rules. The data-flow tracker is presumably the answer, which makes its under-specification (point 3) load-bearing. Walking through one such attack end-to-end and showing exactly how the flow labels catch it would tighten the threat-model story.
Overall. The design instincts here are good, the framing is sharp, and the architectural invariant about control plane vs. data plane is a genuinely useful contribution to articulate. The path to making this a substantially stronger piece of work is mostly about aligning the paper's claims with what the slides acknowledge as the current state of the prototype — and then either delivering on the Lean/Z3 verification before the final cutoff or reframing it as planned work. With those adjustments, this could become a credible contribution to the agent-safety toolchain space.
The paper presents a formally verified system with provable guarantees, yet none of it appears to have actually run. Section 5 is an evaluation plan, the Lean and Z3 proof certificates are described but never produced, and Listing 2 is a hand-written session illustration. Implementing even the single SQL agent scenario with real block rates and latencies would strengthen the argument far more than the current breadth, and references such as "Chen et al." and "von Hippel et al." need full titles, venues, and years so the related work can be checked.
Cite this work
@misc {
title={
(HckPrj) Adaptive Permission Sandbox for LLM Agents
},
author={
Shaheen Raja
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


