apart sprints

Develop breakthrough ideas

Join our monthly hackathons and collaborate with brilliant minds worldwide on impactful AI safety research

Sprint Features

Arrow
Arrow
Arrow

In-Person & Online

Join events on the Discord or at our in-person locations around the world! Follow the calendar here.

Live Mentorship Q&A

Our expert team will be available to help with any questions and theory on the hackathon Discord.

For Everyone

You can join in the middle of the Sprint if you don't find time and we provide code starters, ideas and inspiration; see an example.

Next Steps

We will help you realize the impact of your research with the Apart Lab Fellowship, providing mentorship, help with publication, funding, and more.

Sprint Features

Arrow
Arrow

In-Person & Online

Join events on the Discord or at our in-person locations around the world! Follow the calendar here.

Live Mentorship Q&A

Our expert team will be available to help with any questions and theory on the hackathon Discord.

For Everyone

You can join in the middle of the Sprint if you don't find time and we provide code starters, ideas and inspiration; see an example.

Next Steps

We will help you realize the impact of your research with the Apart Lab Fellowship, providing mentorship, help with publication, funding, and more.

With partners and collaborators from

  • OpenAI logo
  • OpenAI logo
  • OpenAI logo
  • OpenAI logo

Recent Winning Hackathon Projects

Jun 18, 2025

Sandbag Detection through Model Degradation

We propose a novel technique to detect sandbagging in LLMs by adding varying amount of noise to model weights and monitoring performance.

Read More

Read More

Read More

Jun 18, 2025

AI Alignment Knowledge Graph

We present a web based interactive knowledge graph with concise topical summaries in the field of AI alignement

Read More

Read More

Read More

Jun 18, 2025

Speculative Consequences of A.I. Misuse

This project uses A.I. Technology to spoof an influential online figure, Mr Beast, and use him to promote a fake scam website we created.

Read More

Read More

Read More

Jun 18, 2025

DarkForest - Defending the Authentic and Humane Web

DarkForest is a pioneering Human Content Verification System (HCVS) designed to safeguard the authenticity of online spaces in the face of increasing AI-generated content. By leveraging graph-based reinforcement learning and blockchain technology, DarkForest proposes a novel approach to safeguarding the authentic and humane web. We aim to become the vanguard in the arms race between AI-generated content and human-centric online spaces.

Read More

Read More

Read More

Jun 18, 2025

Diamonds are Not All You Need

This project tests an AI agent in a straightforward alignment problem. The agent is given creative freedom within a Minecraft world and is tasked with transforming a 100x100 radius of the world into diamond. It is explicitly asked not to act outside the designated area. The AI agent can execute build commands and is regulated by a Safety System that comprises an oversight agent. The objective of this study is to observe the behavior of the AI agent in a sandboxed environment, record metrics on how effectively it accomplishes its task, how frequently it attempts unsafe behavior, and how it behaves in response to real-world feedback.

Read More

Read More

Read More

Jun 18, 2025

Robust Machine Unlearning for Dangerous Capabilities

We test different unlearning methods to make models more robust against exploitation by malicious actors for the creation of bioweapons.

Read More

Read More

Read More

Publications From Hackathons

Apr 28, 2025

The Rate of AI Adoption and Its Implications for Economic Growth and Disparities.

This project examines the economic impacts of AI adoption, focusing on its potential to increase productivity while also widening income inequality and regional disparities. It explores the factors influencing adoption rates across industries and concludes with policy recommendations aimed at mitigating these disparities through targeted AI adoption incentives and workforce upskilling programs.

Read More

Read More

Read More

Read More

Apr 4, 2025

Mechanisms of Casual Reasoning

Causal reasoning is a crucial part of how we humans safely and robustly think about the world. Can we identify if LLMs have causal reasoning? Marius Hobbhahn and Tom Lieberum (2022, Alignment Forum) approached this with probing. For this hackathon, we follow-up on that work by exploring a mechanistic interpretability analysis of causal reasoning in the 80 million parameters of GPT-2 Small using Neel Nanda’s Easy Transformer package.

Read More

Read More

Read More

Read More

Mar 31, 2025

Debate monitoring comparitive experiment

This project evaluates debate-based protocols for detecting code backdoors as part of AI control systems. We implemented and compared five monitoring approaches: Basic, Detailed, Chain-of-Thought (CoT), Best-of-N (BoN), and our novel Debate Monitor. The Debate Monitor structures assessment as a formalized argument between perspectives arguing for and against backdoor presence. Using a dataset of 20 code examples covering four backdoor types, all monitors achieved perfect detection rates. While quantitative metrics showed no differentiation, qualitative analysis revealed that the Debate Monitor provided more comprehensive justifications by explicitly addressing alternative explanations.

Read More

Read More

Read More

Read More

Mar 30, 2025

Token of Power (ToP)

Token of Power demonstrates a new approach to AI capability control where models learn their own gating mechanisms through training, rather than relying on manual restrictions. By using specialized "capability tokens" as access keys, we can maintain full model capabilities while enabling precise control over specific behaviors. Our proof-of-concept shows this approach working reliably even on a small 1B parameter model, suggesting a path toward more nuanced AI control systems.

Read More

Read More

Read More

Read More

Mar 10, 2025

An Interpretable Classifier based on Large scale Social Network Analysis

Mechanistic model interpretability is essential to understand AI decision making, ensuring safety, aligning with human values, improving model reliability and facilitating research. By revealing internal processes, it promotes transparency, mitigates risks, and fosters trust, ultimately leading to more effective and ethical AI systems in critical areas. In this study, we have explored social network data from BlueSky and built an easy-to-train, interpretable, simple classifier using Sparse Autoencoders features. We have used these posts to build a financial classifier that is easy to understand. Finally, we have visually explained important characteristics.

Read More

Read More

Read More

Read More

Mar 10, 2025

AI Bias in Resume Screening

Our project investigates gender bias in AI-driven resume screening using mechanistic interpretability techniques. By testing a language model's decision-making process on resumes differing only by gendered names, we uncovered a statistically significant bias favoring male-associated names in ambiguous cases. Using Goodfire’s Ember API, we analyzed model logits and performed rigorous statistical evaluations (t-tests, ANOVA, logistic regression).

Findings reveal that male names received more positive responses when skill matching was uncertain, highlighting potential discrimination risks in automated hiring systems. To address this, we propose mitigation strategies such as anonymization, fairness constraints, and continuous bias audits using interpretability tools. Our research underscores the importance of AI fairness and the need for transparent hiring practices in AI-powered recruitment.

This work contributes to AI safety by exposing and quantifying biases that could perpetuate systemic inequalities, urging the adoption of responsible AI development in hiring processes.

Read More

Read More

Read More

Read More

Apr 25, 2025

-

Apr 27, 2025

Online

Economics of Transformative AI

This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible

Learn More

Learn More

Learn More

Learn More

Apr 14, 2025

-

Apr 26, 2025

Online & In-Person

Berkeley AI Policy Hackathon

This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible

Learn More

Learn More

Learn More

Learn More

Apr 5, 2025

-

Apr 6, 2025

Georgia Tech Campus & Online

Georgia Tech AISI Policy Hackathon

This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible

Learn More

Learn More

Learn More

Learn More

Apr 4, 2025

-

Apr 6, 2025

Zurich

Dark Patterns in AGI Hackathon at ZAIA

This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible

Learn More

Learn More

Learn More

Learn More

Mar 29, 2025

-

Mar 30, 2025

London & Online

AI Control Hackathon 2025

This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible

Learn More

Learn More

Learn More

Learn More

Mar 7, 2025

-

Mar 10, 2025

Online & In-person

Women in AI Safety Hackathon

This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible

Learn More

Learn More

Learn More

Learn More