Feb 10, 2023
-
Feb 13, 2023
Scale Oversight for Machine Learning Hackathon
Join us for the fifth Alignment Jam where we get to spend 48 hours of intense research on how we can measure and monitor the safety of large-scale machine learning models. Work on safety benchmarks, models detecting faults in other models, self-monitoring systems , and so much else!
This event is ongoing.
This event has concluded.
Hosted by Esben Kran, pseudobison, Zaki, fbarez, ruiqi-zhong · #alignmentjam
Join us for the fifth Alignment Jam where we get to spend 48 hours of intense research on how we can measure and monitor the safety of large-scale machine learning models. Work on safety benchmarks, models detecting faults in other models, self-monitoring systems , and so much else!
🏆$2,000 on the line

Measuring and monitoring safety
To make sure large machine learning models follow what we want them to do, we have to have people monitoring their safety. BUT, it is indeed very hard for just one person to monitor all the outputs of ChatGPT...
The objective of this hackathon is to research scalable solutions to this problem!
Can we create good benchmarks that run independently of human oversight?
Can we train AI models themselves to find faults in other models?
Can we create ways for one human to monitor a much larger amount of data?
Can we reduce the misgeneralization of the original model using some novel method?
These are all very interesting questions that we're excited to see your answers for during theses 48 hours
Reading group
Join the Discord above to be a part of the reading group where we read up on the research within scaling oversight! The current pieces are:
Inspiration
Inspiring resources for scalable oversight and ML safety:
This lecture explains how future machine learning and AI systems might look and how we might predict emergent behaviour from large systems, something that is increasingly important in the context of scalable oversight: YouTube video link
Watch Cambridge professor David Krueger's talk on ML safety: YouTube video link
Watch the Center for AI Safety's Dan Hendrycks' short (10m) lecture on transparency in machine learning: YouTube video link
Watch this lecture on Trojan neural networks, a way to study when neural networks diverge from our expectations: YouTube video link
Get notified when the intro talk stream starts on the Friday of the event!
Scale Oversight resources
Join us for the fifth Alignment Jam where we get to spend 48 hours of intense research on how we can measure and monitor the safety of large-scale machine learning models. Work on safety benchmarks, models detecting faults in other models, self-monitoring systems, and so much else!
To make sure large machine learning models follow what we want them to do, we have to have people monitoring their safety. BUT, it is indeed very hard for just one person to monitor all the outputs of ChatGPT...
The objective of this hackathon is to research scalable solutions to this problem!
Can we create good benchmarks that run independently of human oversight?
Can we train AI models themselves to find faults in other models?
Can we create ways for one human to monitor a much larger amount of data?
Can we reduce the misgeneralization of the original model using some novel method?
These are all very interesting questions that we're excited to see your answers for during theses 48 hours!
Dive deeper:
Measuring Progress on Scalable Oversight for Large Language Models
Benchmarks in AI safety by Isabella Duan
Use this API key for OpenAI API access: sk-rTnWIq6mUZysHnOP78veT3Bl
bkFJ1RmKgqzYksCO0UQoyBUj
You probably want to view this website on a computer or laptop.
See here how to upload your project to the hackathon page and copy the PDF report template here.
See more resources here.
Entries
Check back later to see entries to this event
Our Other Sprints
Apr 25, 2025
-
Apr 27, 2025
Economics of Transformative AI: Research Sprint
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible
Apr 25, 2025
-
Apr 26, 2025
Berkeley AI Policy Hackathon
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible