Mar 23, 2026
Blue Beam
Patrick Dugan
BlueBeam is a control-protocol research stack for risky software-agent behavior under imperfect monitorability. We built a benchmark-safe pipeline around BashArena-style trajectories that combines cheap risk heuristics, latent-sidechannel analysis, corruption-gap calibration, token-fed SAEs, and cross-referenced dictionary probes. The strongest current signal is orchestration-linked multi_step_risk_escalation, with useful supporting structure from manipulation-family probes such as unauthorized_config_edit and secret_touch_drift.
The positive result is that latent and probe-based analysis can rank behavior families and layers that separate benign from compromised conditions without saturating into hard-block behavior on benign traffic. The negative result is also included: a family-specific TinyLoRA follow-up did not yet translate into a strong controller-side intervention gain, so we present that as future work rather than overclaiming. We packaged figures, methodology, SAE outputs, and future-
work notes in the submission bundle, and published a hosted artifact dashboard here: https://bluebeam-app.vercel.app
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Blue Beam
},
author={
Patrick Dugan
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


