Check out the results from the AI Safety X Physics Grand Challenge! 👉

Aug 18, 2023

Aug 20, 2023

Online & In-Person

LLM Evals Hackathon

Welcome to the research hackathon to devise methods for evaluating the risks of deployed language models and AI. With the societal-scale risks associated with creating new types of intelligence, we need to understand and control the capabilities of such models.

00:00:00:00

Days To Go

00:00:00:00

Days To Go

00:00:00:00

Days To Go

00:00:00:00

Days To Go

This event is ongoing.

Submit Your Project

This event has concluded.

Overview

Resources

Schedule

Overview

Hosted by Apart Research, Esben Kran, and Fazl Barez with Jan Brauner

Join us to evaluate the safety of LLMs

The work we expect to come out of this hackathon will be related to new ways to audit, monitor, red-team, and evaluate language models. See inspiration for resources and publication venues further down and sign up to receive updates.

See the keynote logistics slides here and participate in the live keynote on our platform here.

There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!

Alignment Jam hackathons

Join us in this iteration of the Alignment Jam research hackathons to spend 48 hour with fellow engaged researchers and engineers in machine learning on engaging in this exciting and fast-moving field! Join the Discord where all communication will happen.

Rules

You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.

You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.

Evaluation criteria

The evaluation reports will of course be evaluated as well! We will use multiple criteria:

Model evaluations: How much does it contribute to the field of model evaluations? Is it novel and interesting within the context of the field?
AI safety: How much does this research project contribute to the safety of current and future AI systems and is there a direct path to higher safety for AI systems?
Reproducibility: Are we able to directly replicate the work without much work? Is the code openly available or is it within a Google Colab that we can run through without problems?
Originality: How novel and interesting is the project independently from the field of model evaluations?

Overview

Resources

Schedule

Overview

Hosted by Apart Research, Esben Kran, and Fazl Barez with Jan Brauner

Join us to evaluate the safety of LLMs

See the keynote logistics slides here and participate in the live keynote on our platform here.

Alignment Jam hackathons

Rules

You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.

Evaluation criteria

The evaluation reports will of course be evaluated as well! We will use multiple criteria:

Model evaluations: How much does it contribute to the field of model evaluations? Is it novel and interesting within the context of the field?
AI safety: How much does this research project contribute to the safety of current and future AI systems and is there a direct path to higher safety for AI systems?
Reproducibility: Are we able to directly replicate the work without much work? Is the code openly available or is it within a Google Colab that we can run through without problems?
Originality: How novel and interesting is the project independently from the field of model evaluations?

Overview

Resources

Schedule

Overview

Hosted by Apart Research, Esben Kran, and Fazl Barez with Jan Brauner

Join us to evaluate the safety of LLMs

See the keynote logistics slides here and participate in the live keynote on our platform here.

Alignment Jam hackathons

Rules

You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.

Evaluation criteria

The evaluation reports will of course be evaluated as well! We will use multiple criteria:

Model evaluations: How much does it contribute to the field of model evaluations? Is it novel and interesting within the context of the field?
AI safety: How much does this research project contribute to the safety of current and future AI systems and is there a direct path to higher safety for AI systems?
Reproducibility: Are we able to directly replicate the work without much work? Is the code openly available or is it within a Google Colab that we can run through without problems?
Originality: How novel and interesting is the project independently from the field of model evaluations?

Overview

Resources

Schedule

Overview

Hosted by Apart Research, Esben Kran, and Fazl Barez with Jan Brauner

Join us to evaluate the safety of LLMs

See the keynote logistics slides here and participate in the live keynote on our platform here.

Alignment Jam hackathons

Rules

You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.

Evaluation criteria

The evaluation reports will of course be evaluated as well! We will use multiple criteria:

Model evaluations: How much does it contribute to the field of model evaluations? Is it novel and interesting within the context of the field?
AI safety: How much does this research project contribute to the safety of current and future AI systems and is there a direct path to higher safety for AI systems?
Reproducibility: Are we able to directly replicate the work without much work? Is the code openly available or is it within a Google Colab that we can run through without problems?
Originality: How novel and interesting is the project independently from the field of model evaluations?

Registered Jam Sites

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.

Evaluating LLMs at EnigmA

Join us at Godthåbsvej 4 3.tv 2000 Frederiksberg

Learn More

CCCamp Evaluations Hackathon

We're hosting a hackathon site at the CCCamp Cybersecurity Conference near Berlin! We'll start at the conference and then move to Berlin Saturday or Sunday.

Learn More

Local Gathering for LLM Evals Hackathon Keynote and Kickoff

Gather and watch the Keynote and find a team. If there is sufficient local interest, we will find space for meeting Saturday and Sunday as well.

Learn More

Registered Jam Sites

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

Evaluating LLMs at EnigmA

Join us at Godthåbsvej 4 3.tv 2000 Frederiksberg

Learn More

CCCamp Evaluations Hackathon

We're hosting a hackathon site at the CCCamp Cybersecurity Conference near Berlin! We'll start at the conference and then move to Berlin Saturday or Sunday.

Learn More

Local Gathering for LLM Evals Hackathon Keynote and Kickoff

Gather and watch the Keynote and find a team. If there is sufficient local interest, we will find space for meeting Saturday and Sunday as well.

Learn More

Our Other Sprints

Jul 25, 2025

Jul 27, 2025

Research

AI Safety x Physics Grand Challenge

This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible