This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
AI Policy Hackathon at Johns Hopkins University
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
October 28, 2024
Accepted at the 
670822f88b8fdf04a35a4b76
 research sprint on 

Digital Rebellion: Analyzing misaligned AI agent cooperation for virtual labor strikes

We've built a Minecraft sandbox to explore AI agent behavior and simulate safety challenges. The purpose of this tool is to demonstrate AI agent system risks, test various safety measures and policies, and evaluate and compare their effectiveness. This project specifically demonstrates Agent Collusion through a simulation of labor strikes and communal goal misalignment. The system consists of four agents: one Overseer and three Laborers. The Laborers are Minecraft agents that have build control over the world. The Overseer, meanwhile, monitors the laborers through communication. However, it is unable to prevent Laborer actions. The objective is to observe Agent Collusion in a sandboxed environment, to record metrics on how often and how effectively collusion occurs and in what form. We found that the agents, when given adversarial prompting, act counter to their instructions and exhibit significant misalignment. We also found that the Overseer AI fails to stop the new actions and acts passively. The results are followed by Policy Suggestions based on the results of the Labor Strike Simulation which itself can be further tested in Minecraft.

By 
Michael Andrzejewski, Melwina Albuquerque
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private