Jun 30, 2023
-
Jul 2, 2023
Safety Benchmarks Hackathon
Participate in the Alignment Jam on safety benchmarks to spend a weekend with AI safety researchers to formulate and demonstrate new ideas in measuring the safety of artificially intelligent systems.
This event is ongoing.
This event has concluded.
Hosted by Apart Research, Esben Kran, and Fazl Barez
Explore safer AI with fellow researchers and enthusiasts
Large AI models are released nearly every week. We need to find ways to evaluate these models (especially at the complexity of GPT-4) to ensure that they will not have critical failures after deployment, e.g. autonomous power-seeking, biases for unethical behaviors, and other phenomena that arise in deployment (e.g. inverse scaling).
Participate in the Alignment Jam on safety benchmarks to spend a weekend with AI safety researchers to formulate and demonstrate new ideas in measuring the safety of artificially intelligent systems.
Rewatch the keynote talk Alexander Pan above

The MACHIAVELLI benchmark (left) and the Inverse Scaling Prize (right)
Sign up below to be notified before the kickoff! Read up on the schedule, see instructions for how to participate, and inspiration below.
Submission details
You are required to submit:
A PDF report using the template linked on the submission page
A maximum 10 minute video presenting your findings and results (see inspiration and instructions for how to do this on the submission page)
You are optionally encouraged to submit:
A slide deck describing your project
A link to your code
Any other material you would like to link
Inspiration
Here are a few inspiring papers, talks, and posts about safety benchmarks. See more starter code, articles, and readings under the "Resources" tab.

Friday, June 30th
UTC 17:00: Keynote talk with Alexander Pan, lead author on the MACHIAVELLI benchmark paper along with logistical information from the organizing team
UTC 18:00: Team formation and coordination
Saturday, July 1st
UTC 14:00: Virtual project discussions
UTC 17:00: A talk with Antonio Miceli-Barone on "The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python" (Twitter thread), an inverse scaling phenomenon in large language models.
Sunday, July 2nd
UTC 14:00: Virtual project discussions
UTC 19:00: Online ending session
Monday UTC 2:00: Submission deadline
Wednesday, July 5th
UTC 19:00: International project presentations!
Entries
Check back later to see entries to this event
Our Other Sprints
Apr 25, 2025
-
Apr 27, 2025
Economics of Transformative AI: Research Sprint
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible
Apr 25, 2025
-
Apr 26, 2025
Berkeley AI Policy Hackathon
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible