Aug 18, 2023
-
Aug 20, 2023
Online & In-Person
LLM Evals Hackathon




Welcome to the research hackathon to devise methods for evaluating the risks of deployed language models and AI. With the societal-scale risks associated with creating new types of intelligence, we need to understand and control the capabilities of such models.
00:00:00:00
00:00:00:00
00:00:00:00
00:00:00:00
Welcome to the research hackathon to devise methods for evaluating the risks of deployed language models and AI. With the societal-scale risks associated with creating new types of intelligence, we need to understand and control the capabilities of such models.
This event is ongoing.
This event has concluded.
Overview
Resources
Schedule
Overview

Hosted by Apart Research, Esben Kran, and Fazl Barez with Jan Brauner
Join us to evaluate the safety of LLMs
Welcome to the research hackathon to devise methods for evaluating the risks of deployed language models and AI. With the societal-scale risks associated with creating new types of intelligence, we need to understand and control the capabilities of such models.
The work we expect to come out of this hackathon will be related to new ways to audit, monitor, red-team, and evaluate language models. See inspiration for resources and publication venues further down and sign up to receive updates.
See the keynote logistics slides here and participate in the live keynote on our platform here.
There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!
Alignment Jam hackathons
Join us in this iteration of the Alignment Jam research hackathons to spend 48 hour with fellow engaged researchers and engineers in machine learning on engaging in this exciting and fast-moving field! Join the Discord where all communication will happen.
Rules
You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.
You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.
Evaluation criteria
The evaluation reports will of course be evaluated as well! We will use multiple criteria:
Model evaluations: How much does it contribute to the field of model evaluations? Is it novel and interesting within the context of the field?
AI safety: How much does this research project contribute to the safety of current and future AI systems and is there a direct path to higher safety for AI systems?
Reproducibility: Are we able to directly replicate the work without much work? Is the code openly available or is it within a Google Colab that we can run through without problems?
Originality: How novel and interesting is the project independently from the field of model evaluations?
Overview
Resources
Schedule
Overview

Hosted by Apart Research, Esben Kran, and Fazl Barez with Jan Brauner
Join us to evaluate the safety of LLMs
Welcome to the research hackathon to devise methods for evaluating the risks of deployed language models and AI. With the societal-scale risks associated with creating new types of intelligence, we need to understand and control the capabilities of such models.
The work we expect to come out of this hackathon will be related to new ways to audit, monitor, red-team, and evaluate language models. See inspiration for resources and publication venues further down and sign up to receive updates.
See the keynote logistics slides here and participate in the live keynote on our platform here.
There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!
Alignment Jam hackathons
Join us in this iteration of the Alignment Jam research hackathons to spend 48 hour with fellow engaged researchers and engineers in machine learning on engaging in this exciting and fast-moving field! Join the Discord where all communication will happen.
Rules
You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.
You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.
Evaluation criteria
The evaluation reports will of course be evaluated as well! We will use multiple criteria:
Model evaluations: How much does it contribute to the field of model evaluations? Is it novel and interesting within the context of the field?
AI safety: How much does this research project contribute to the safety of current and future AI systems and is there a direct path to higher safety for AI systems?
Reproducibility: Are we able to directly replicate the work without much work? Is the code openly available or is it within a Google Colab that we can run through without problems?
Originality: How novel and interesting is the project independently from the field of model evaluations?
Overview
Resources
Schedule
Overview

Hosted by Apart Research, Esben Kran, and Fazl Barez with Jan Brauner
Join us to evaluate the safety of LLMs
Welcome to the research hackathon to devise methods for evaluating the risks of deployed language models and AI. With the societal-scale risks associated with creating new types of intelligence, we need to understand and control the capabilities of such models.
The work we expect to come out of this hackathon will be related to new ways to audit, monitor, red-team, and evaluate language models. See inspiration for resources and publication venues further down and sign up to receive updates.
See the keynote logistics slides here and participate in the live keynote on our platform here.
There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!
Alignment Jam hackathons
Join us in this iteration of the Alignment Jam research hackathons to spend 48 hour with fellow engaged researchers and engineers in machine learning on engaging in this exciting and fast-moving field! Join the Discord where all communication will happen.
Rules
You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.
You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.
Evaluation criteria
The evaluation reports will of course be evaluated as well! We will use multiple criteria:
Model evaluations: How much does it contribute to the field of model evaluations? Is it novel and interesting within the context of the field?
AI safety: How much does this research project contribute to the safety of current and future AI systems and is there a direct path to higher safety for AI systems?
Reproducibility: Are we able to directly replicate the work without much work? Is the code openly available or is it within a Google Colab that we can run through without problems?
Originality: How novel and interesting is the project independently from the field of model evaluations?
Overview
Resources
Schedule
Overview

Hosted by Apart Research, Esben Kran, and Fazl Barez with Jan Brauner
Join us to evaluate the safety of LLMs
Welcome to the research hackathon to devise methods for evaluating the risks of deployed language models and AI. With the societal-scale risks associated with creating new types of intelligence, we need to understand and control the capabilities of such models.
The work we expect to come out of this hackathon will be related to new ways to audit, monitor, red-team, and evaluate language models. See inspiration for resources and publication venues further down and sign up to receive updates.
See the keynote logistics slides here and participate in the live keynote on our platform here.
There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!
Alignment Jam hackathons
Join us in this iteration of the Alignment Jam research hackathons to spend 48 hour with fellow engaged researchers and engineers in machine learning on engaging in this exciting and fast-moving field! Join the Discord where all communication will happen.
Rules
You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.
You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.
Evaluation criteria
The evaluation reports will of course be evaluated as well! We will use multiple criteria:
Model evaluations: How much does it contribute to the field of model evaluations? Is it novel and interesting within the context of the field?
AI safety: How much does this research project contribute to the safety of current and future AI systems and is there a direct path to higher safety for AI systems?
Reproducibility: Are we able to directly replicate the work without much work? Is the code openly available or is it within a Google Colab that we can run through without problems?
Originality: How novel and interesting is the project independently from the field of model evaluations?
Registered Jam Sites
Register A Location
Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.
The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.
Registered Jam Sites
Register A Location
Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.
The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.
Our Other Sprints
May 30, 2025
-
Jun 1, 2025
Research
Apart x Martian Mechanistic Interpretability Hackathon
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible
Sign Up
Sign Up
Sign Up
Apr 25, 2025
-
Apr 27, 2025
Research
Economics of Transformative AI
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible
Sign Up
Sign Up
Sign Up

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events