Mar 22, 2026
Parity Swarm: Using Population-Based Social Simulation to Discover AI Safety Monitor Blind Spots
Mohammad Zeeshan
Parity Swarm is an AI control red-teaming platform that uses multi-agent social simulation to automatically discover attack strategies that evade safety monitors. One AI agent harbours a hidden malicious objective while others perform legitimate work, creating an imposter dynamic that mirrors real insider threats. We ran 59 simulations across 10 attack categories evaluated by an independent 70B monitor. Key findings: (1) multi-agent colluding attacks evaded detection 100% of the time in our primary experiment; (2) LLM-generated report summaries act as a lossy filter, discarding 16.2% of detectable attack signals relative to raw transcripts. Overall detection rate 70% (narrative-only) and 86.2% (with raw transcripts). Constitutes empirical evidence that covert multi-agent coordination and LLM summarisation pipelines are systematic blind spots in current AI safety monitoring.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Parity Swarm: Using Population-Based Social Simulation to Discover AI Safety Monitor Blind Spots
},
author={
Mohammad Zeeshan
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


