Mar 30, 2025
AI Control through Majority Voting
Andreas Moe, Willam Wale, Joakim Hauger Sunde
This report introduces an AI control protocol designed to safely use powerful but potentially untrustworthy AI for programming tasks. The core problem is that such AI might deliberately insert backdoors into the generated code. The proposed solution queries the untrustworthy AI multiple times (c) with the same problem, but slightly varied prompts (using random tokens). This yields c different solutions. A trusted AI then creates a final program that runs all c solutions and outputs the result determined by a majority vote.
I like this alot. I think that adding random tokens to the end of the prompt isn't good enough though. The untrusted model might decide to attack on any input that has the word "prime" in it so adding random token at the end wouldn't change anything. But you coudl try summerizing or changing the prompt in other ways. Overall I think this was a good project though
Cite this work
@misc {
title={
AI Control through Majority Voting
},
author={
Andreas Moe, Willam Wale, Joakim Hauger Sunde
},
date={
3/30/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


