Mar 23, 2026

A Comparative Analysis of Ensemble Protocols for AI Control

Ariel Monzon, Natan Katz, Inbar Shulman, Mike Vazan, Or Shemesh, Guy Porat

We empirically evaluate ensembles of AI control protocols and show that performance depends on signal diversity rather than the number of tools: small, diverse ensembles outperform larger, redundant ones. Correlation between tools limits gains from stacking, while consensus-based aggregation improves robustness, providing a principled basis for ensemble design.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

Congrats on a very thorough project! Mapping ensemble interactions across 100+ tool combinations is ambitious for a hackathon weekend. The report is really clear and well presented.

Impact & Innovation: The research question, although not groundbreaking, is interesting and pulling it off successfully would provide safety gains. The two novel detection mechanisms are well-motivated. As you flagged, the non-adaptive adversary does bound how directly these ensembles transfer to deployment, though a full red team exploration is understandably beyond hackathon scope.

Execution Quality: The experimental design is impressively thorough. Two gaps worth noting:

- The comparison between ensembles and individual tools isn't compute-matched. A 3-tool ensemble gets three inference passes per sample while a single protocol gets one. Therefore, it's unclear whether the gains come from complementary signals or simply from more compute. The right control would be to run each individual protocol multiple times and aggregate those results, then compare against a tool ensemble at equal compute budget. Without this, the central claim about diversity vs. quantity is confounded.

- There are no confidence intervals in the results or error bars in the figures. Adding them would help calibrate the relevance of the results.

Presentation & Clarity: The paper is well-written and concise with a clear narrative arc. Figures are well-chosen and informative. Minor notes: some figure captions carry substantial analytical claims that would read better in the main text.

Strong empirical contribution! Really enjoyed reading through the project.

Exhaustively evaluating all tool combinations across multiple aggregation methods is thorough and well-designed. The central finding (mechanism diversity matters more than tool count, and the best small ensemble outperforms the full ensemble) is counterintuitive and immediately actionable. The two novel detection mechanisms are both clever. ACP exploiting backdoor brittleness for semantically preserving perturbations is a particularly nice insight.

Cite this work

@misc {

title={

(HckPrj) A Comparative Analysis of Ensemble Protocols for AI Control

},

author={

Ariel Monzon, Natan Katz, Inbar Shulman, Mike Vazan, Or Shemesh, Guy Porat

},

date={

3/23/26

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

Feb 2, 2026

Fingerprinting All AI Cluster I/O Without Mutually Trusted Processors

We design and simulate a "border patrol" device for generating cryptographic evidence of data traffic entering and leaving an AI cluster, while eliminating the specific analog and steganographic side-channels that post-hoc verification can not close. The device eliminates the need for any mutually trusted logic, while still meeting the security needs of the prover and verifier.

Read More

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.