Mar 30, 2025
AI Control through Majority Voting
Andreas Moe, Willam Wale, Joakim Hauger Sunde
Summary
This report introduces an AI control protocol designed to safely use powerful but potentially untrustworthy AI for programming tasks. The core problem is that such AI might deliberately insert backdoors into the generated code. The proposed solution queries the untrustworthy AI multiple times (c) with the same problem, but slightly varied prompts (using random tokens). This yields c different solutions. A trusted AI then creates a final program that runs all c solutions and outputs the result determined by a majority vote.
Cite this work:
@misc {
title={
AI Control through Majority Voting
},
author={
Andreas Moe, Willam Wale, Joakim Hauger Sunde
},
date={
3/30/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}