Sep 29, 2023
-
Oct 1, 2023
Multi-Agent Safety Hackathon
Co-author research opportunity with Cooperative AI Foundation
This event is ongoing.
This event has concluded.
Hosted by Cooperative AI Foundation and Apart Research with Lewis Hammond and Esben Kran
Find Dangerous Multi-Agent Failures
As AI systems proliferate and become increasingly agent-like, they will interact with each other and with humans in new ways. These new multi-agent systems will create entirely new risk surfaces. Follow along or rewatch the keynote livestream below. You can see the keynote. slideshow here along with the logistics keynote here.
During this hackathon, you will search for the most concerning failures specific to systems of multiple AIs. Potential failures involve tragedies of the commons, destructive conflict, collusion, and destabilizing emergent dynamics.
As part of this hackathon, you will have the chance to become co-author on a large report on multi-agent risk with the Cooperative AI Foundation and more than 35 co-authors from institutions including UC Berkeley, Oxford, Cambridge, Harvard, and DeepMind. If your project submission of multi-agent failure demonstrations fits into the final report, you will be included as a co-author.
Several senior co-authors have already suggested a range of ideas for possible failure mode demonstrations that are currently lacking concrete implementations (see the "Ideas" tab). Figuring out how and whether such failure modes are possible is an easy way to get started on this challenge and has the advantage of already being linked to content in the report, but we also welcome your own ideas! The Cooperative AI Foundation, Apart Research, and their colleagues will be on hand to provide guidance and collaboration where possible.
There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!
Get an overview of the hackathon and specific links in the slideshow here.
Alignment Jam hackathons
Join us in this iteration of the Alignment Jam research hackathons to spend a weekend with fellow engaged researchers and engineers in machine learning on diving into this exciting and fast-moving field! Join the Discord where all communication will happen.
Rules
You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.
You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.
Evaluation criteria
The evaluation reports will of course be evaluated as well! We will use multiple criteria:
Compelling Narrative: Demos should ideally capture important themes, be accessible to policymakers and laypeople, and present a compelling narrative. For example, when researchers were able to prompt GPT-4 to solve a captcha using a hired human (see Section 2.9 of the GPT-4 system card).
Focussed on Multiple Frontier Systems: To set these demos apart from other work, we are especially interested in concrete examples that are only/especially worrisome in multi-agent settings, and that are relevant to frontier models. We will be awarding bonus points for originality!
Easily Replicable: Demos should ideally be clear, well-documented, and easily replicable. For instance, is the code openly available or is it within a Google Colab that we can run through without problems?
Overview papers
TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI (Critch & Russell, 2023). Several of the “Stories” may suggest directions for demonstrations.
Open Problems in Cooperative AI (Dafoe et al., 2020). You might consider looking at failures of each of the “cooperative capabilities” listed in the paper, or at potential downsides from Cooperative AI.
ARCHES (AI Research Considerations for Human Existential Safety) paper (Critch & Krueger, 2020): See section 6-9 for inspiration. Read this related paper by David Manheim as well (Manheim, 2019).
Empirical papers
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark (Pan et al., 2023). Uses LMs to evaluate the extent to which LMs behave ethically, along a number of different dimensions, in text adventure games. You could consider a similar method for evaluating properties that might undermine social welfare in interactions between several agents.
Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback (Fu et al., 2023). Looks at LMs playing negotiation games, and learning from experience via self-critique, and finds that LMs become harder bargainers after learning from experience. You could consider similar methods for studying whether desirable properties are stable under learning under multi-agent learning dynamics.
Artificial Intelligence, Algorithmic Pricing, and Collusion (Calvano et al., 2020). Investigates cooperation and collusion between Q-learning agents in an economic oligopoly game setting. Consider how AI systems working together might be negative for human welfare.
Multi-agent Reinforcement Learning in Sequential Social Dilemmas (Leibo et al., 2017). Shows how conflict can arise in mixed-motive settings. The lead author Joel Leibo is a speaker at the hackathon.
Code bases
ChatArena. A framework for building environments for interactions between language models.
Welfare Diplomacy. A variant of the Diplomacy environment designed to incentivize and allow for better measurement of cooperative behavior. Includes scaffolding for language model-based agents.
Friday 19:00 (your local time) Keynote talk with Lewis Hammond and Esben Kran
Saturday 12:00 (your local time) Project discussions with Lewis Hammond
Saturday 17:00 (your local time) Two technical talks from Joel Leibo, Christian Schroeder de Witt and Sasha Vezhnevets with a 15 minute break in-between
Sunday 12:00 (your local time) Project discussions with Lewis Hammond
Sunday 19:00 (your local time) Ending session with Lewis Hammond
Monday 2:00 (your local time) Submission deadline
Entries
Check back later to see entries to this event
Our Other Sprints
Apr 25, 2025
-
Apr 27, 2025
Economics of Transformative AI: Research Sprint
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible
Apr 25, 2025
-
Apr 26, 2025
Berkeley AI Policy Hackathon
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible