Finished

The Martian Fellowship on Mechanistic Interpretability

The Martian

This fellowship has concluded.

Structure

Dates:

September 2025 - December 2025

Program Overview

Research teams will work on an impactful project proposed and guided by a leading mechanistic interpretability researcher, and produce publishable results, open-source tools, or technical reports by the end of the program.

Projects and Expert Advisors

Projects are proposed and guided by field leaders in mechanistic interpretability, leading researchers shaping how we understand neural network internals.

Our vision is high quality and productive collaborations that produce publishable and impactful work in a short time frame.

Details

Why This Matters

AI systems are becoming increasingly powerful and complex, yet we still lack a fundamental understanding of how they work internally. Mechanistic interpretability aims to reverse-engineer neural networks to understand the algorithms and representations they use. As AI becomes more integrated into critical decisions, our ability to understand, verify, and trust these systems becomes essential for safety. Yet while many researchers are advancing AI capabilities, far too few are working to understand what's actually happening inside these models.

Focus Areas

Features and Representations

Identifying interpretable features in activation space, superposition and polysemanticity, dictionary learning and sparse autoencoders, feature universality across models

Circuit Analysis

Reverse-engineering computational subgraphs, attention head analysis, induction heads and in-context learning circuits, composition of features and circuits

Scalable Interpretability Methods

Automated circuit discovery, techniques that scale to frontier models, activation patching and causal tracing, efficient probing methods

Model Behavior Understanding

Connecting internal mechanisms to model behaviors, understanding reasoning and planning, locating factual knowledge, analyzing failure modes

Tools and Infrastructure

Building interpretability libraries and frameworks, visualization tools for neural network internals, benchmarks for interpretability methods, open-source tooling for the research community

Why Join

Understand AI from the inside out: Develop expertise in one of the most important and rapidly growing areas of AI safety research.

Build your research portfolio: Co-author papers suitable for top-tier ML venues and conferences.

Create meaningful impact: Contribute to research that could be crucial for ensuring advanced AI systems are safe and trustworthy.

Build your professional visibility: Connect with and be seen within the global mechanistic interpretability community.

Be part of a strong network: Get access to The Martian and Apart's mentorship and career connections.

Get cash prizes: Participants receive milestone-based stipends, with an additional prize awarded for the best paper.

Who Should Join

We're seeking researchers and engineers with strong technical backgrounds interested in understanding how neural networks work from the inside out. Ideal participants bring curiosity, technical depth, research experience and the desire to collaborate with interpretability researchers on open problems.

Useful skill sets

Neural network architectures and training
Linear algebra and representation analysis
Probing and activation analysis techniques
Circuit analysis and feature identification
Building interpretability tools and visualizations

FAQ

Do I need prior interpretability experience? No, strong ML engineering or research experience is what we value most. Familiarity with transformer architectures is helpful.

Is this paid? Yes, there are prizes for completion of each research stage ($1,000 - $2,000), best-paper awards, and funded conference travel.

Can I participate while working full-time? Only if you can dedicate at least 10 hours per week.

What if I'm not matched? You'll stay in The Martian network for future projects and Forum extensions.

What is the application process? The application process consists of the application form, a work trial and a short interview, after which candidates will be matched to projects based on team fit.