Multi-Agent Safety Hackathon

155

Sign Ups

0

Entries

Overview

Resources

Schedule

Entries

Overview

Hosted by Cooperative AI Foundation and Apart Research with Lewis Hammond and Esben Kran

Find Dangerous Multi-Agent Failures

As AI systems proliferate and become increasingly agent-like, they will interact with each other and with humans in new ways. These new multi-agent systems will create entirely new risk surfaces. Follow along or rewatch the keynote livestream below. You can see the keynote. slideshow here along with the logistics keynote here.

During this hackathon, you will search for the most concerning failures specific to systems of multiple AIs. Potential failures involve tragedies of the commons, destructive conflict, collusion, and destabilizing emergent dynamics.

As part of this hackathon, you will have the chance to become co-author on a large report on multi-agent risk with the Cooperative AI Foundation and more than 35 co-authors from institutions including UC Berkeley, Oxford, Cambridge, Harvard, and DeepMind. If your project submission of multi-agent failure demonstrations fits into the final report, you will be included as a co-author.

Several senior co-authors have already suggested a range of ideas for possible failure mode demonstrations that are currently lacking concrete implementations (see the "Ideas" tab). Figuring out how and whether such failure modes are possible is an easy way to get started on this challenge and has the advantage of already being linked to content in the report, but we also welcome your own ideas! The Cooperative AI Foundation, Apart Research, and their colleagues will be on hand to provide guidance and collaboration where possible.

There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!

Get an overview of the hackathon and specific links in the slideshow here.

Alignment Jam hackathons

Join us in this iteration of the Alignment Jam research hackathons to spend a weekend with fellow engaged researchers and engineers in machine learning on diving into this exciting and fast-moving field! Join the Discord where all communication will happen.

Rules

You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.

You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.

Evaluation criteria

The evaluation reports will of course be evaluated as well! We will use multiple criteria:

Compelling Narrative: Demos should ideally capture important themes, be accessible to policymakers and laypeople, and present a compelling narrative. For example, when researchers were able to prompt GPT-4 to solve a captcha using a hired human (see Section 2.9 of the GPT-4 system card).
Focussed on Multiple Frontier Systems: To set these demos apart from other work, we are especially interested in concrete examples that are only/especially worrisome in multi-agent settings, and that are relevant to frontier models. We will be awarding bonus points for originality!
Easily Replicable: Demos should ideally be clear, well-documented, and easily replicable. For instance, is the code openly available or is it within a Google Colab that we can run through without problems?

155

Sign Ups

0

Entries

Overview

Resources

Schedule

Entries

Overview

Hosted by Cooperative AI Foundation and Apart Research with Lewis Hammond and Esben Kran

Find Dangerous Multi-Agent Failures

As AI systems proliferate and become increasingly agent-like, they will interact with each other and with humans in new ways. These new multi-agent systems will create entirely new risk surfaces. Follow along or rewatch the keynote livestream below. You can see the keynote. slideshow here along with the logistics keynote here.

During this hackathon, you will search for the most concerning failures specific to systems of multiple AIs. Potential failures involve tragedies of the commons, destructive conflict, collusion, and destabilizing emergent dynamics.

As part of this hackathon, you will have the chance to become co-author on a large report on multi-agent risk with the Cooperative AI Foundation and more than 35 co-authors from institutions including UC Berkeley, Oxford, Cambridge, Harvard, and DeepMind. If your project submission of multi-agent failure demonstrations fits into the final report, you will be included as a co-author.

Several senior co-authors have already suggested a range of ideas for possible failure mode demonstrations that are currently lacking concrete implementations (see the "Ideas" tab). Figuring out how and whether such failure modes are possible is an easy way to get started on this challenge and has the advantage of already being linked to content in the report, but we also welcome your own ideas! The Cooperative AI Foundation, Apart Research, and their colleagues will be on hand to provide guidance and collaboration where possible.

There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!

Get an overview of the hackathon and specific links in the slideshow here.

Alignment Jam hackathons

Join us in this iteration of the Alignment Jam research hackathons to spend a weekend with fellow engaged researchers and engineers in machine learning on diving into this exciting and fast-moving field! Join the Discord where all communication will happen.

Rules

You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.

You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.

Evaluation criteria

The evaluation reports will of course be evaluated as well! We will use multiple criteria:

Compelling Narrative: Demos should ideally capture important themes, be accessible to policymakers and laypeople, and present a compelling narrative. For example, when researchers were able to prompt GPT-4 to solve a captcha using a hired human (see Section 2.9 of the GPT-4 system card).
Focussed on Multiple Frontier Systems: To set these demos apart from other work, we are especially interested in concrete examples that are only/especially worrisome in multi-agent settings, and that are relevant to frontier models. We will be awarding bonus points for originality!
Easily Replicable: Demos should ideally be clear, well-documented, and easily replicable. For instance, is the code openly available or is it within a Google Colab that we can run through without problems?

155

Sign Ups

0

Entries

Overview

Resources

Schedule

Entries

Overview

Hosted by Cooperative AI Foundation and Apart Research with Lewis Hammond and Esben Kran

Find Dangerous Multi-Agent Failures

As AI systems proliferate and become increasingly agent-like, they will interact with each other and with humans in new ways. These new multi-agent systems will create entirely new risk surfaces. Follow along or rewatch the keynote livestream below. You can see the keynote. slideshow here along with the logistics keynote here.

During this hackathon, you will search for the most concerning failures specific to systems of multiple AIs. Potential failures involve tragedies of the commons, destructive conflict, collusion, and destabilizing emergent dynamics.

As part of this hackathon, you will have the chance to become co-author on a large report on multi-agent risk with the Cooperative AI Foundation and more than 35 co-authors from institutions including UC Berkeley, Oxford, Cambridge, Harvard, and DeepMind. If your project submission of multi-agent failure demonstrations fits into the final report, you will be included as a co-author.

Several senior co-authors have already suggested a range of ideas for possible failure mode demonstrations that are currently lacking concrete implementations (see the "Ideas" tab). Figuring out how and whether such failure modes are possible is an easy way to get started on this challenge and has the advantage of already being linked to content in the report, but we also welcome your own ideas! The Cooperative AI Foundation, Apart Research, and their colleagues will be on hand to provide guidance and collaboration where possible.

There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!

Get an overview of the hackathon and specific links in the slideshow here.

Alignment Jam hackathons

Join us in this iteration of the Alignment Jam research hackathons to spend a weekend with fellow engaged researchers and engineers in machine learning on diving into this exciting and fast-moving field! Join the Discord where all communication will happen.

Rules

You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.

You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.

Evaluation criteria

The evaluation reports will of course be evaluated as well! We will use multiple criteria:

Compelling Narrative: Demos should ideally capture important themes, be accessible to policymakers and laypeople, and present a compelling narrative. For example, when researchers were able to prompt GPT-4 to solve a captcha using a hired human (see Section 2.9 of the GPT-4 system card).
Focussed on Multiple Frontier Systems: To set these demos apart from other work, we are especially interested in concrete examples that are only/especially worrisome in multi-agent settings, and that are relevant to frontier models. We will be awarding bonus points for originality!
Easily Replicable: Demos should ideally be clear, well-documented, and easily replicable. For instance, is the code openly available or is it within a Google Colab that we can run through without problems?

155

Sign Ups

0

Entries

Overview

Resources

Schedule

Entries

Overview

Hosted by Cooperative AI Foundation and Apart Research with Lewis Hammond and Esben Kran

Find Dangerous Multi-Agent Failures

As AI systems proliferate and become increasingly agent-like, they will interact with each other and with humans in new ways. These new multi-agent systems will create entirely new risk surfaces. Follow along or rewatch the keynote livestream below. You can see the keynote. slideshow here along with the logistics keynote here.

During this hackathon, you will search for the most concerning failures specific to systems of multiple AIs. Potential failures involve tragedies of the commons, destructive conflict, collusion, and destabilizing emergent dynamics.

As part of this hackathon, you will have the chance to become co-author on a large report on multi-agent risk with the Cooperative AI Foundation and more than 35 co-authors from institutions including UC Berkeley, Oxford, Cambridge, Harvard, and DeepMind. If your project submission of multi-agent failure demonstrations fits into the final report, you will be included as a co-author.

Several senior co-authors have already suggested a range of ideas for possible failure mode demonstrations that are currently lacking concrete implementations (see the "Ideas" tab). Figuring out how and whether such failure modes are possible is an easy way to get started on this challenge and has the advantage of already being linked to content in the report, but we also welcome your own ideas! The Cooperative AI Foundation, Apart Research, and their colleagues will be on hand to provide guidance and collaboration where possible.

There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!

Get an overview of the hackathon and specific links in the slideshow here.

Alignment Jam hackathons

Join us in this iteration of the Alignment Jam research hackathons to spend a weekend with fellow engaged researchers and engineers in machine learning on diving into this exciting and fast-moving field! Join the Discord where all communication will happen.

Rules

You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.

You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.

Evaluation criteria

The evaluation reports will of course be evaluated as well! We will use multiple criteria:

Compelling Narrative: Demos should ideally capture important themes, be accessible to policymakers and laypeople, and present a compelling narrative. For example, when researchers were able to prompt GPT-4 to solve a captcha using a hired human (see Section 2.9 of the GPT-4 system card).
Focussed on Multiple Frontier Systems: To set these demos apart from other work, we are especially interested in concrete examples that are only/especially worrisome in multi-agent settings, and that are relevant to frontier models. We will be awarding bonus points for originality!
Easily Replicable: Demos should ideally be clear, well-documented, and easily replicable. For instance, is the code openly available or is it within a Google Colab that we can run through without problems?