May 30, 2025

-

Jun 1, 2025

Online & In-Person

Apart x Martian Mechanistic Interpretability Hackathon

Join Martian and Apart Research for a weekend of innovation building judges and routers that optimize AI systems

44 : 00 : 46 : 31

44 : 00 : 46 : 31

44 : 00 : 46 : 31

44 : 00 : 46 : 31

Join Martian and Apart Research for a weekend of innovation building judges and routers that optimize AI systems

This event is ongoing.

This event has concluded.

0

Sign Ups

0

Entries

Overview

Resources

Guidelines

Entries

Overview

Arrow

✨ Overview

Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.

Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate the quality, accuracy, or relevance of AI-generated outputs, ensuring models produce reliable results. Routers, direct queries or tasks to the most suitable model, improving efficiency, performance and robustness by leveraging specialized AI models. Together, these components enhance AI safety and deliver better model accuracy than any one existing model can. These components better support model specialization - democratizing the model creation process. This approach and Mechanistic Interpretability techniques enable new ways to understand model capabilities.

We're thrilled to partner with Martian, pioneers in model routing technology, to bring you this specialized hackathon. Their groundbreaking approach aggregates the best capabilities across multiple AI models, achieving higher performance than any single model while reducing costs.

Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.

Sign up here to stay updated for this event

Why This Hackathon Matters:

The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:

  • Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results

  • Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive

  • Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment

  • Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations

Challenge Tracks

Creating Judges Track

  1. Model Characteristic Analysis: Create a dataset that tests an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.

  2. Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics.

  3. Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges that can evaluate outputs based on how they were generated.

Creating Routers Track

  1. Risk-Sensitive Routing: Build efficient routing algorithms considering judge scores, dollar costs, and system reliability.

  2. Multi-Objective Routing: Create routers that balance multiple evaluation criteria (e.g., ethics and legality) according to user preferences.

  3. Task Decomposition: Develop systems that break complex tasks into subtasks that can be routed to specialized models before recombining results.

Inferring Judges & Routers Track

  1. Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.

  2. Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.

0

Sign Ups

0

Entries

Overview

Resources

Guidelines

Entries

Overview

Arrow

✨ Overview

Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.

Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate the quality, accuracy, or relevance of AI-generated outputs, ensuring models produce reliable results. Routers, direct queries or tasks to the most suitable model, improving efficiency, performance and robustness by leveraging specialized AI models. Together, these components enhance AI safety and deliver better model accuracy than any one existing model can. These components better support model specialization - democratizing the model creation process. This approach and Mechanistic Interpretability techniques enable new ways to understand model capabilities.

We're thrilled to partner with Martian, pioneers in model routing technology, to bring you this specialized hackathon. Their groundbreaking approach aggregates the best capabilities across multiple AI models, achieving higher performance than any single model while reducing costs.

Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.

Sign up here to stay updated for this event

Why This Hackathon Matters:

The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:

  • Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results

  • Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive

  • Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment

  • Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations

Challenge Tracks

Creating Judges Track

  1. Model Characteristic Analysis: Create a dataset that tests an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.

  2. Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics.

  3. Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges that can evaluate outputs based on how they were generated.

Creating Routers Track

  1. Risk-Sensitive Routing: Build efficient routing algorithms considering judge scores, dollar costs, and system reliability.

  2. Multi-Objective Routing: Create routers that balance multiple evaluation criteria (e.g., ethics and legality) according to user preferences.

  3. Task Decomposition: Develop systems that break complex tasks into subtasks that can be routed to specialized models before recombining results.

Inferring Judges & Routers Track

  1. Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.

  2. Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.

0

Sign Ups

0

Entries

Overview

Resources

Guidelines

Entries

Overview

Arrow

✨ Overview

Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.

Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate the quality, accuracy, or relevance of AI-generated outputs, ensuring models produce reliable results. Routers, direct queries or tasks to the most suitable model, improving efficiency, performance and robustness by leveraging specialized AI models. Together, these components enhance AI safety and deliver better model accuracy than any one existing model can. These components better support model specialization - democratizing the model creation process. This approach and Mechanistic Interpretability techniques enable new ways to understand model capabilities.

We're thrilled to partner with Martian, pioneers in model routing technology, to bring you this specialized hackathon. Their groundbreaking approach aggregates the best capabilities across multiple AI models, achieving higher performance than any single model while reducing costs.

Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.

Sign up here to stay updated for this event

Why This Hackathon Matters:

The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:

  • Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results

  • Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive

  • Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment

  • Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations

Challenge Tracks

Creating Judges Track

  1. Model Characteristic Analysis: Create a dataset that tests an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.

  2. Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics.

  3. Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges that can evaluate outputs based on how they were generated.

Creating Routers Track

  1. Risk-Sensitive Routing: Build efficient routing algorithms considering judge scores, dollar costs, and system reliability.

  2. Multi-Objective Routing: Create routers that balance multiple evaluation criteria (e.g., ethics and legality) according to user preferences.

  3. Task Decomposition: Develop systems that break complex tasks into subtasks that can be routed to specialized models before recombining results.

Inferring Judges & Routers Track

  1. Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.

  2. Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.

0

Sign Ups

0

Entries

Overview

Resources

Guidelines

Entries

Overview

Arrow

✨ Overview

Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.

Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate the quality, accuracy, or relevance of AI-generated outputs, ensuring models produce reliable results. Routers, direct queries or tasks to the most suitable model, improving efficiency, performance and robustness by leveraging specialized AI models. Together, these components enhance AI safety and deliver better model accuracy than any one existing model can. These components better support model specialization - democratizing the model creation process. This approach and Mechanistic Interpretability techniques enable new ways to understand model capabilities.

We're thrilled to partner with Martian, pioneers in model routing technology, to bring you this specialized hackathon. Their groundbreaking approach aggregates the best capabilities across multiple AI models, achieving higher performance than any single model while reducing costs.

Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.

Sign up here to stay updated for this event

Why This Hackathon Matters:

The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:

  • Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results

  • Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive

  • Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment

  • Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations

Challenge Tracks

Creating Judges Track

  1. Model Characteristic Analysis: Create a dataset that tests an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.

  2. Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics.

  3. Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges that can evaluate outputs based on how they were generated.

Creating Routers Track

  1. Risk-Sensitive Routing: Build efficient routing algorithms considering judge scores, dollar costs, and system reliability.

  2. Multi-Objective Routing: Create routers that balance multiple evaluation criteria (e.g., ethics and legality) according to user preferences.

  3. Task Decomposition: Develop systems that break complex tasks into subtasks that can be routed to specialized models before recombining results.

Inferring Judges & Routers Track

  1. Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.

  2. Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.

Speakers & Collaborators

Jason Schreiber

Organizer and Judge

Jason is co-director of Apart Research and leads Apart Lab, our remote-first AI safety research fellowship.

Speakers & Collaborators

Jason Schreiber

Organizer and Judge

Jason is co-director of Apart Research and leads Apart Lab, our remote-first AI safety research fellowship.

Registered Jam Sites

Register A Location

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.

We haven't announced jam sites yet

Check back later

Registered Jam Sites

Register A Location

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.

We haven't announced jam sites yet

Check back later