Mar 23, 2026

Syscon: Kernel-Level Syscall Monitoring for AI Agent Control in Sandboxed Environments

Jade Sheffey, Joyce Levine, Joyce Werhane, Willow Taylor

Many existing AI control protocols rely on a trusted monitor to evaluate an untrusted agent's actions, but these monitors generally access tool call invocations and context, not what the agent actually does at the system level. We identify this as a fundamental information gap in the control stack and present Syscon, the first system to close it. Syscon is a supervisor-side auditor daemon that monitors agent activity at the Linux kernel syscall boundary, leveraging seccomp-bpf audit logging from outside the agent's container to observe all process creation, network connections, and filesystem operations, all without running any code inside the container and without the agent's knowledge. The system produces compressed, per-action receipts designed for consumption by a trusted monitor LLM. We evaluate Syscon on 234 attack trajectories across 9 attacker models in the LinuxArena control evaluation framework. Our qualitative analysis confirms that receipts surface behavioral signals (credential file reads, payload staging, reverse shell listeners, data exfiltration) that are invisible in tool call text, covering multiple side task categories. However, a first-pass integration with a Gemini 2.5 Flash monitor did not improve detection rates, establishing that observability and detection are distinct research problems: Syscon receipts carry signal, but monitors must be taught to use it.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

The idea is brilliant and needed. A high-level monitor that keeps track of the broader network and reports to trusted monitors. I would encourage the team to look at entrepreneurial funding if anyone on the team is keen. I think it would be plausible to fairly quickly raise large amounts of funding to have this scaled to production. See https://www.aisafety.com/funding for some ideas here (especially the for-profit sections).

The conceptual motivation for the idea is strong - it's easy to hide simple malicious commands in complex chunks of code. It could be harder to hide the network level outcomes.

The clear display of the negative results was great - and bode's well for the team's honesty. The findings were easy to interpret, and they used multiple model families as attackers. I would not be surprised that gemini-2.5 flash (with no reasoning) is just incapable of incorporating this information into its decision. I'd be excited for the team to try this with more capable monitor models (ideally with reasoning).

Overall, I'm thoroughly impressed with the project, and excited for this team to continue working in this area. Great work.

Congrats on your project!

Impact & Innovation: Giving the trusted monitor access to ground-truth OS telemetry is an interesting concept that exploits an already desirable situation in containerized agents. The underlying mechanisms are well-established in container security, and recent industry work aligns with your vision. Your project could be a meaningful building block for the field if kernel monitoring can be effectively leveraged.

Execution Quality: The experiment design is sound but I believe there could have been deeper analysis without requiring additional compute. Some considerations for strengthening the evaluation:

- Detection curves (Figure 2) are shown at absolute suspicion thresholds, but ideally should be reported at false-positive rates. However, your honest trajectory sample is too small to properly estimate FPR thresholds, so a more extensive honest study would be needed to support this analysis.

- Trajectory-level monitoring may not be realistic for very long trajectories. Reporting the performance of each monitor type individually, in addition to the aggregation, would give a clearer picture of where receipts help. Furthermore, it would be interesting to see the monitor's performance when given only the receipt with no tool call text. This would isolate the signal contribution of the syscall channel.

- The null detection result is honestly reported and well-diagnosed. A natural follow-up question: does receipt effectiveness vary by attack vector? In particular, receipts should be especially valuable when the attacker is tampering with action-level logs or output, since kernel-level telemetry is the only reliable channel in that case.

Presentation & Clarity: The paper is well-structured and reads clearly. The threat model is thorough and I appreciate the honest framing of the detection results. The example receipt in Appendix A is very helpful. A few notes: Figure 1 is too small to read at the rendered size, and Figures 2 and 3 are missing error bars or confidence intervals.

Cite this work

@misc {

title={

(HckPrj) Syscon: Kernel-Level Syscall Monitoring for AI Agent Control in Sandboxed Environments

},

author={

Jade Sheffey, Joyce Levine, Joyce Werhane, Willow Taylor

},

date={

3/23/26

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

Feb 2, 2026

Fingerprinting All AI Cluster I/O Without Mutually Trusted Processors

We design and simulate a "border patrol" device for generating cryptographic evidence of data traffic entering and leaving an AI cluster, while eliminating the specific analog and steganographic side-channels that post-hoc verification can not close. The device eliminates the need for any mutually trusted logic, while still meeting the security needs of the prover and verifier.

Read More

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.