Nov 23, 2025
SafetyBench
Hendrik Wirthwein
Recent work demonstrating strategic deception in LLM agents playing Diplomacy and scenario forecasting efforts like AI 2027 highlight the urgent need for controlled frameworks to evaluate multi-agent AI behaviors systematically. We introduce **apart**, a hybrid multi-agent orchestration framework combining configuration-driven game mechanics with LLM-powered simulation engines, enabling systematic comparison of heterogeneous agent-model combinations (Google Gemini, OpenAI, local Ollama models) across configurable scenarios while maintaining computational efficiency and reproducibility. The framework incorporates post-hoc behavioral analysis detecting dangerous patterns (power-seeking, deception, rule exploitation) through lightweight signal collection and LLM-as-judge evaluation, providing quantitative safety scoring with minimal performance impact (<5ms per step).
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) SafetyBench
},
author={
Hendrik Wirthwein
},
date={
11/23/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


