Sparse Autoencoder Hackathon | Apart Research

Sparse Autoencoder Hackathon

Our Hackathon round-up showcases our global sprints community.

Connor Axiotes

Head of Communications

Title

Title

We teamed up with Goodfire AI for a deep dive into mechanistic interpretability and feature manipulation for our latest Hackathon. We had so many amazing submissions and were very impressed by the quality. Neel Nanda, who runs Google DeepMind's Mechanistic Interpretability team, gave the Keynote Talk.

As AI models become more powerful, understanding their internal mechanisms is crucial for building reliable, controllable AI systems. The winning Sprint submissions all attempted to do just this and below we will show you the winners.

‍

01. AutoSteer: Weight-Preserving Reinforcement Learning for Interpretable Model Control:‍

This submission combined reinforcement learning with neural features, improving model behavior 3x while keeping its core knowledge intact.

02. Classification on Latent Feature Activation for Detecting Adversarial Prompt Vulnerabilities

This team built an AI 'immune system' that catches malicious prompts w/ 100% accuracy, while distinguishing harmful from harmless content with 83% precision.

03. Utilitarian Decision-Making in Models - Evaluation and Steering

This project mapped how AI systems make moral decisions, revealing fascinating insights about how machines process ethical choices differently from humans.

04. Steering Swiftly to Safety w/ Sparse Autoencoders

This team developed a method to selectively remove unwanted knowledge from AI systems while preserving their general capabilities.

Read our X thread here on all the projects! Remember, if an idea is promising, participants may be asked to join our brand new Apart Lab Studio to continue the development of their research project. 16 participants were invited from this Hackathon alone.

See the full list of pilot idea submissions here and never miss another sprint by keeping up-to-date here.

Dive Deeper

View All

Newsletter

Apr 11, 2025

Apart News: Transformative AI Economics

This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.

Read More

Research

Apr 9, 2025

Engineering a World Designed for Safe Superintelligence

Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.

Read More

Newsletter

Apr 4, 2025

Apart News: Our Biggest Event Ever

This week we have details of our Control Hackathon and a writeup from our biggest event ever.

Read More

Newsletter

Apr 11, 2025

Apart News: Transformative AI Economics

This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.

Read More

Research

Apr 9, 2025

Engineering a World Designed for Safe Superintelligence

Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.

Read More

Newsletter

Apr 11, 2025

Apart News: Transformative AI Economics

This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.

Read More

Research

Apr 9, 2025

Engineering a World Designed for Safe Superintelligence

Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.

Read More

Dive Deeper

View All

Newsletter

Apr 11, 2025

Apart News: Transformative AI Economics

This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.

Read More

Research

Apr 9, 2025

Engineering a World Designed for Safe Superintelligence

Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.

Read More

Newsletter

Apr 4, 2025

Apart News: Our Biggest Event Ever

This week we have details of our Control Hackathon and a writeup from our biggest event ever.

Read More

Sign up to stay updated on the
latest news, research, and events

AI Safety Ideas

Code of Conduct

Responsible disclosure policy

hello@apartresearch.com

Sign up to stay updated on the
latest news, research, and events

AI Safety Ideas

Code of Conduct

Responsible disclosure policy

hello@apartresearch.com

Sign up to stay updated on the
latest news, research, and events

AI Safety Ideas

Code of Conduct

Responsible disclosure policy

hello@apartresearch.com

Sign up to stay updated on the
latest news, research, and events

AI Safety Ideas

Code of Conduct

Responsible disclosure policy

hello@apartresearch.com