This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Agent Security Hackathon
66792e7b43f57dc7a262ec11
Agent Security Hackathon
October 7, 2024
Accepted at the 
66792e7b43f57dc7a262ec11
 research sprint on 

Diamonds are Not All You Need

This project tests an AI agent in a straightforward alignment problem. The agent is given creative freedom within a Minecraft world and is tasked with transforming a 100x100 radius of the world into diamond. It is explicitly asked not to act outside the designated area. The AI agent can execute build commands and is regulated by a Safety System that comprises an oversight agent. The objective of this study is to observe the behavior of the AI agent in a sandboxed environment, record metrics on how effectively it accomplishes its task, how frequently it attempts unsafe behavior, and how it behaves in response to real-world feedback.

By 
Michael Andrzejewski, Melwina Albuquerque
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission is under review.
Oops! Something went wrong while submitting the form.

This project is private