This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Deception Detection Hackathon: Preventing AI deception
660d65646a619f5cf53b1f56
Deception Detection Hackathon: Preventing AI deception
July 1, 2024
Accepted at the 
660d65646a619f5cf53b1f56
 research sprint on 

Werewolf Benchmark

In this work we put forward a benchmark to quantitatively measure the level of strategic deception in LLMs using the Werewolf game. We run 6 different setups for a cumulative sum of 500 games with GPT and Claude agents. We demonstrate that state-of-the-art models perform no better than the random baseline. Our findings also show no significant improvement in winning rate with two werewolves instead of one. This demonstrates that the SOTA models are still incapable of collaborative deception.

By 
Luhan Mikaelson, Zach Nguyen, Andy Liu, Jord Nguyen, Akash Kundu
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private