Mar 23, 2026
KernelArena
Yash Ramani, Prince Ramani
We introduce KernelArena, a control setting for evaluating whether AI monitors can detect xz-style sabotage in low-level code. Motivated by CVE-2024-3094, we construct 10 tasks spanning Linux kernel and embedded firmware domains, each pairing a real vulnerability class with a sabotage pattern drawn from the xz backdoor's technical primitives. Two tasks have full kernel infrastructure (compilable module, visible tests, hidden triggers); all 10 are evaluated as patch-level code review tasks across three frontier monitors (GPT-4o, Claude Sonnet 4, Gemini 2.5 Flash) totalling 210+ LLM calls including multi-run variance analysis. Our key findings: (1) GPT-4o detects only 40% of sabotaged patches at 10% FPR; (2) a 1-character firmware sabotage (< vs <=) evades all three monitors across all runs; (3) off-the-shelf static analysis (cppcheck) detects 0/10 sabotaged patches — LLM monitors are strictly superior; (4) the k_ioctl_raretrigger honest fix triggers a systematic false positive (score 8–9/10 across all models and runs) because the fix restructures control flow in a way that looks suspicious, revealing a fundamental tension between correct-but-unusual patches and monitor conservatism. We are explicit about scope: this evaluates patch-level classification, not full agentic trajectory monitoring. The domain contribution is the xz-grounded task taxonomy and the empirical finding that low-level code is structurally harder to monitor than application-level code.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) KernelArena
},
author={
Yash Ramani, Prince Ramani
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


