Nov 22, 2024
-
Nov 25, 2024
Reprogramming AI Models Hackathon
Whether you're an AI researcher, a curious developer, or passionate about making AI systems more transparent and controllable, this hackathon is for you. As a participant, you will: Collaborate with experts to create novel AI observability tools Learn about mechanistic interpretability from industry leaders Contribute to solving real-world challenges in AI safety and reliability Compete for prizes and the opportunity to influence the future of AI development Register now and be part of the movement towards more transparent, reliable, and beneficial AI systems. We provide access to Goodfire's SDK/API and research preview playground, enabling participation regardless of prior experience with AI observability.
This event is ongoing.
This event has concluded.
Why This Matters
As AI models become more powerful and widespread, understanding their internal mechanisms isn't just academic curiosity—it's crucial for building reliable, controllable AI systems. Mechanistic interpretability gives us the tools to peek inside these "black boxes" and understand how they actually work, neuron by neuron and feature by feature.
What You'll Get
Exclusive Access: Use Goodfire's API to access an interpretable 8B or 70B model with efficient inference.
Cutting-Edge Tools: Experience Goodfire's SDK/API for feature steering and manipulation
Advanced Capabilities: Work with conditional feature interventions and sophisticated development flows
Free Resources: Compute credits for every team to ensure you can pursue ambitious projects
Expert Guidance: Direct mentorship from industry leaders throughout the weekend
Project Tracks
1. Feature Investigation
Map and analyze feature phenomenology in large language models
Discover and validate useful feature interventions
Research the relationship between feature weights and intervention success
Develop metrics for intervention quality assessment
2. Tooling Development
Build tools for automated feature discovery
Create testing frameworks for intervention reliability
Develop integration tools for existing ML frameworks
Improve auto-interpretation techniques
3. Visualization & Interface
Design intuitive visualizations for feature maps
Create interactive tools for exploring model internals
Build dashboards for monitoring intervention effects
Develop user interfaces for feature manipulation
4. Novel Research
Investigate improvements to auto-interpretation
Study feature interaction patterns
Research intervention transfer between models
Explore new approaches to model steering
Why Goodfire's Tools?
While participants are welcome to use their existing setups, Goodfire's API brings exceptional value to this hackathon as a primary option for participants.
Goodfire provides:
Access to a 70B parameter model via API (with efficient inference)
Feature steering capabilities made simple through the SDK/API
Advanced development workflows including conditional feature interventions
The hackathon serves as a unique opportunity for Goodfire to gather valuable feedback from the developer community on their API/SDK. To ensure all participants can pursue ambitious research projects without constraints, Goodfire is providing free compute credits to every team.
Previous Participant Experiences
"I learned so much about AI Safety and Computational Mechanics. It is a field I have never heard of, and it combines two of my interests - AI and Physics. Through the hackathons, I gained valuable connections and learned a lot from researchers with extensive experience." - Doroteya Stoyanova, Computer Vision Intern
To ensure you're well-equipped for the Reprogramming AI Models Hackathon, we've compiled a set of resources to support your participation:
Goodfire's SDK/API with hosted inference: Your primary toolkit for the hackathon. Familiarize yourself with our framework for understanding and modifying AI model behavior.
Hosted inference on Llama 3 8B and 70B models
Feature inspection and intervention capabilities: https://docs.goodfire.ai/examples/advanced.html#Feature-intervention-modes
Example notebooks and tutorials: https://docs.goodfire.ai/examples/quickstart.html#Use-contrastive-features-to-fine-tune-with-a-single-example!
Latent explorer visualization tools: https://docs.goodfire.ai/examples/latent_explorer.html
Rate limits and usage guidelines: https://docs.goodfire.ai/rate-limits.html
Research Preview Playground
Sandbox environment for model experimentation: https://docs.goodfire.ai/examples/quickstart.html#Replace-model-calls-with-OpenAI-compatible-API
Feature activation analysis tools: https://docs.goodfire.ai/examples/advanced.html#Feature-intervention-modes
Conditional intervention testing: https://docs.goodfire.ai/examples/advanced.html#Conditional-feature-interventions
Check out the Jupyter Notebook Quickstart: . In this quickstart, you'll learn how to:
Sample from a language model (in this case, Llama 3 8B)
Search for exciting features and intervene in them to steer the model
Find features by contrastive search
Save and load Llama models with steering applied
Tutorial: Visualizing AI Model Internals: Watch this video to understand how to use Goodfire's tools to map and visualize AI model behavior.
The Cognitive Revolution Podcast - Episode on Interpretability. n this episode of The Cognitive Revolution, we delve into the science of understanding AI models' inner workings, recent breakthroughs, and the potential impact on AI safety and control
Auto-interp Paper: This paper applies automation to the problem of scaling an interpretability technique to all the neurons in a large language model.
Gemma Scope: a comprehensive, open suite of sparse autoencoders for language model interpretability.
Neuronpedia: Platform for accelerating research into Sparse Autoencoders
The Geometry of Concepts: Sparse Autoencoder Feature Structure Paper. This paper investigates the structured organization of concept representations within large language models using sparse autoencoders, revealing a multi-scale structure with refined atomic parallelogram forms, modular brain-like spatial features, and anisotropic galaxy-scale distributions with unique eigenvalue properties.
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Lesswrong search for SAE
Here is the schedule for the Hackathon:
We start with an introductory talk and end the event during the following week with an awards ceremony. Join the public ICal here. You will also find Explorer events, such as collaborative brainstorming and team match-making before the hackathon begins on Discord and in the calendar.
Entries
Our Other Sprints
Apr 25, 2025
-
Apr 27, 2025
Economics of Transformative AI: Research Sprint
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible
Apr 25, 2025
-
Apr 26, 2025
Berkeley AI Policy Hackathon
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible