Mar 23, 2026
BoxArena: An OCI Runtime Escape Benchmark
Max von Hippel, Quinn Dougherty, Alok Singh
Container runtimes are a critical security boundary for isolating AI agents, yet no existing benchmark compares runtimes head-to-head on containment under a common attacker. We present BoxArena, an open-source evaluation framework for comparing Open Container Initiative (OCI) compatible runtimes by running language-model attackers against a battery of container-escape quests. BoxArena contributes to the Scalable Formal Oversight (SFO) research program: rather than relying just on alignment, it treats the isolation boundary itself as the auditable artifact and measures how well each runtime enforces that boundary under adversarial pressure. The system provides a harness with runtime switching, a local backend with static leaderboard export, and host-side monitoring. The quest battery has five attack surfaces: filesystem, socket, process, network, and syscall. BoxArena reports resilience (fraction of quests blocked) and attacker score (difficulty- and speed-weighted), while preserving monitoring evidence and post-hoc control verdicts for later analysis.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) BoxArena: An OCI Runtime Escape Benchmark
},
author={
Max von Hippel, Quinn Dougherty, Alok Singh
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


