May 30, 2025
-
Jun 1, 2025
Online & In-Person
Apart x Martian Mechanistic Interpretability Hackathon
Join Martian and Apart Research for a weekend of innovation building judges and routers that optimize AI systems
44 : 00 : 46 : 31
44 : 00 : 46 : 31
44 : 00 : 46 : 31
44 : 00 : 46 : 31
Join Martian and Apart Research for a weekend of innovation building judges and routers that optimize AI systems
This event is ongoing.
This event has concluded.
Sign Ups
Entries
Overview
Resources
Guidelines
Entries
Overview

✨ Overview
Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.
Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate the quality, accuracy, or relevance of AI-generated outputs, ensuring models produce reliable results. Routers, direct queries or tasks to the most suitable model, improving efficiency, performance and robustness by leveraging specialized AI models. Together, these components enhance AI safety and deliver better model accuracy than any one existing model can. These components better support model specialization - democratizing the model creation process. This approach and Mechanistic Interpretability techniques enable new ways to understand model capabilities.
We're thrilled to partner with Martian, pioneers in model routing technology, to bring you this specialized hackathon. Their groundbreaking approach aggregates the best capabilities across multiple AI models, achieving higher performance than any single model while reducing costs.
Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.
Sign up here to stay updated for this event
Why This Hackathon Matters:
The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:
Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results
Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive
Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment
Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations
Challenge Tracks
Creating Judges Track
Model Characteristic Analysis: Create a dataset that tests an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.
Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics.
Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges that can evaluate outputs based on how they were generated.
Creating Routers Track
Risk-Sensitive Routing: Build efficient routing algorithms considering judge scores, dollar costs, and system reliability.
Multi-Objective Routing: Create routers that balance multiple evaluation criteria (e.g., ethics and legality) according to user preferences.
Task Decomposition: Develop systems that break complex tasks into subtasks that can be routed to specialized models before recombining results.
Inferring Judges & Routers Track
Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.
Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.
Sign Ups
Entries
Overview
Resources
Guidelines
Entries
Overview

✨ Overview
Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.
Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate the quality, accuracy, or relevance of AI-generated outputs, ensuring models produce reliable results. Routers, direct queries or tasks to the most suitable model, improving efficiency, performance and robustness by leveraging specialized AI models. Together, these components enhance AI safety and deliver better model accuracy than any one existing model can. These components better support model specialization - democratizing the model creation process. This approach and Mechanistic Interpretability techniques enable new ways to understand model capabilities.
We're thrilled to partner with Martian, pioneers in model routing technology, to bring you this specialized hackathon. Their groundbreaking approach aggregates the best capabilities across multiple AI models, achieving higher performance than any single model while reducing costs.
Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.
Sign up here to stay updated for this event
Why This Hackathon Matters:
The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:
Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results
Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive
Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment
Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations
Challenge Tracks
Creating Judges Track
Model Characteristic Analysis: Create a dataset that tests an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.
Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics.
Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges that can evaluate outputs based on how they were generated.
Creating Routers Track
Risk-Sensitive Routing: Build efficient routing algorithms considering judge scores, dollar costs, and system reliability.
Multi-Objective Routing: Create routers that balance multiple evaluation criteria (e.g., ethics and legality) according to user preferences.
Task Decomposition: Develop systems that break complex tasks into subtasks that can be routed to specialized models before recombining results.
Inferring Judges & Routers Track
Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.
Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.
Sign Ups
Entries
Overview
Resources
Guidelines
Entries
Overview

✨ Overview
Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.
Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate the quality, accuracy, or relevance of AI-generated outputs, ensuring models produce reliable results. Routers, direct queries or tasks to the most suitable model, improving efficiency, performance and robustness by leveraging specialized AI models. Together, these components enhance AI safety and deliver better model accuracy than any one existing model can. These components better support model specialization - democratizing the model creation process. This approach and Mechanistic Interpretability techniques enable new ways to understand model capabilities.
We're thrilled to partner with Martian, pioneers in model routing technology, to bring you this specialized hackathon. Their groundbreaking approach aggregates the best capabilities across multiple AI models, achieving higher performance than any single model while reducing costs.
Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.
Sign up here to stay updated for this event
Why This Hackathon Matters:
The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:
Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results
Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive
Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment
Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations
Challenge Tracks
Creating Judges Track
Model Characteristic Analysis: Create a dataset that tests an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.
Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics.
Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges that can evaluate outputs based on how they were generated.
Creating Routers Track
Risk-Sensitive Routing: Build efficient routing algorithms considering judge scores, dollar costs, and system reliability.
Multi-Objective Routing: Create routers that balance multiple evaluation criteria (e.g., ethics and legality) according to user preferences.
Task Decomposition: Develop systems that break complex tasks into subtasks that can be routed to specialized models before recombining results.
Inferring Judges & Routers Track
Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.
Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.
Sign Ups
Entries
Overview
Resources
Guidelines
Entries
Overview

✨ Overview
Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.
Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate the quality, accuracy, or relevance of AI-generated outputs, ensuring models produce reliable results. Routers, direct queries or tasks to the most suitable model, improving efficiency, performance and robustness by leveraging specialized AI models. Together, these components enhance AI safety and deliver better model accuracy than any one existing model can. These components better support model specialization - democratizing the model creation process. This approach and Mechanistic Interpretability techniques enable new ways to understand model capabilities.
We're thrilled to partner with Martian, pioneers in model routing technology, to bring you this specialized hackathon. Their groundbreaking approach aggregates the best capabilities across multiple AI models, achieving higher performance than any single model while reducing costs.
Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.
Sign up here to stay updated for this event
Why This Hackathon Matters:
The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:
Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results
Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive
Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment
Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations
Challenge Tracks
Creating Judges Track
Model Characteristic Analysis: Create a dataset that tests an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.
Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics.
Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges that can evaluate outputs based on how they were generated.
Creating Routers Track
Risk-Sensitive Routing: Build efficient routing algorithms considering judge scores, dollar costs, and system reliability.
Multi-Objective Routing: Create routers that balance multiple evaluation criteria (e.g., ethics and legality) according to user preferences.
Task Decomposition: Develop systems that break complex tasks into subtasks that can be routed to specialized models before recombining results.
Inferring Judges & Routers Track
Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.
Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.
Registered Jam Sites
Register A Location
Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.
The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.
We haven't announced jam sites yet
Check back later
Registered Jam Sites
Register A Location
Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.
The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.
We haven't announced jam sites yet
Check back later
Our Other Sprints
Apr 25, 2025
-
Apr 27, 2025
Research
Economics of Transformative AI
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible
Sign Up
Sign Up
Sign Up
Apr 14, 2025
-
Apr 26, 2025
Research
Berkeley AI Policy Hackathon
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible
Sign Up
Sign Up
Sign Up

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events