21 : 08 : 10 : 48

21 : 08 : 10 : 48

21 : 08 : 10 : 48

21 : 08 : 10 : 48

Keep Apart Research Going: Donate Today

May 30, 2025

-

Jun 1, 2025

Online & In-Person

Apart x Martian Mechanistic Router Interpretability Hackathon

Join the effort to make AI orchestration interpretable from the ground up—where judge models reveal their reasoning process and routing decisions become windows into AI decision-making!

01 : 08 : 10 : 48

01 : 08 : 10 : 48

01 : 08 : 10 : 48

01 : 08 : 10 : 48

Join the effort to make AI orchestration interpretable from the ground up—where judge models reveal their reasoning process and routing decisions become windows into AI decision-making!

This event is ongoing.

This event has concluded.

168

Sign Ups

0

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

✨ Overview

Join us in pioneering a paradigm shift in AI development! This hackathon will focus on creating the building blocks for a revolutionary Expert Orchestration AI architecture that democratizes, aligns, and enhances large language models through intelligent routing to specialized experts.

The current approach of developing increasingly powerful monolithic AI models faces fundamental limitations in performance, transparency, and democratization. This hackathon aims to develop key components of an alternative architecture where specialized "judge" models evaluate the capabilities of thousands of specialized models, and "router" systems intelligently direct queries to the most appropriate models based on these evaluations.

We’re excited to collaborate with Martian, pioneers in model routing and interpretability research, for this specialized hackathon. Their novel expert orchestration approach distills complex AI models into compact, predictive components that retain only the information needed to estimate performance on specific tasks. By combining this with cutting-edge Mechanistic Interpretability techniques, Martian is developing a general theory of model capabilities—enabling us to evaluate and deploy models more safely, transparently, and effectively. Their work also supports an open ecosystem of specialized models, helping democratize AI development while aligning business impact with AI safety.

Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.

Sign up here to stay updated for this event!

Why This Hackathon Matters:

The current trajectory of AI development, focused on increasingly powerful monolithic models, faces fundamental limitations:

  • Winner-takes-all dynamics: The high costs of training frontier models leads to market concentration and economic power in a few corporations.

  • Misaligned safety incentives: Companies racing to release increasingly powerful models may underreport risks and rush products to market.

  • High barriers to entry: Specialized models struggle to gain market traction against generalist models, even when they excel in specific domains.

  • Limited user control: Users have minimal visibility into how models "think" and limited ability to control characteristics like factuality, bias, or ethical reasoning.

  • Inefficient resource use: Using powerful frontier models for all tasks wastes resources and often underperforms specialized alternatives.

The Expert Orchestration Architecture addresses these issues by creating a more transparent, efficient, and democratic AI ecosystem where specialized models can thrive based on their unique strengths, and users gain unprecedented control over AI capabilities.

Expected Outcomes

Participants will create components that advance the Expert Orchestration vision:

  • Prototype judge models for evaluating specific AI capabilities

  • Intelligent routing algorithms for directing queries to appropriate models

  • Frameworks for decomposing complex tasks across multiple specialized models

  • Integration APIs that allow seamless discovery and utilization of specialized models

  • Evaluation metrics and benchmarks for comparing different routing and judge strategies

The most promising projects will have opportunities for continued development and potential integration into production systems.

Challenge Tracks

Track 1: Judge Model Development

Build specialized evaluation models that can assess different AI models' capabilities across dimensions that matter to users (manipulation skills and tendencies, deception and hidden communication, misaligned goals, factuality, domain expertise, ethics, creativity, objectivity, etc.). Judges should provide independent, objective evaluations that create transparency around model strengths and weaknesses.

Track 2: Intelligent Router Systems

Develop router systems that can intelligently direct user queries to the most appropriate specialized models based on user preferences using judge evaluations. Focus areas include routers that use multiple judges (e.g. factuality, ethics and lack of deception), query classification, efficiency optimization, and handling uncertainty.

Track 3: Task Decomposition Frameworks

Create systems that can break down complex user requests into a series of more manageable steps, to be executed by different specialized models. This includes planning phases, execution phases, and coordination mechanisms. Investigate whether decomposition avoids or reduces some of the traps reported for monolithic reasoning models (e.g. reward hacking).

Track 4: Specialized Model Integration

Build frameworks that enable easy integration of new specialized models into the Expert Orchestration Architecture, including methods for model discovery, capability profiling, and dynamic performance evaluation.


Open Research Questions

Judges

Model Characteristic Analysis: Create dataset(s) that test an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.

  1. Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics. Explore how this impacts AI Safety.

  2. Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges e.g. judges that can evaluate outputs based on how they were generated.

  3. Measuring model similarity: HuggingFace hosts thousands of LLMs. How do we measure whether two models have similar capabilities? How do we choose a subset of these models with a sufficiently diverse set of capabilities, that after training the resulting router will be performant? How does the router performance vary with the size of the subset?

Routers

Given a set of models with known capabilities measured by known judge scores:

  1. Risk-Sensitive Routing: Build efficient routing algorithms balancing the judge scores, computes costs, and system reliability for the best user experience.

  2. Multi-Objective Routing: Create routers that use scores from multiple judges (e.g., answer correctness, ethics and legality) according to user preferences for the best user experience. What are the tradeoffs?

  3. Routing algorithms: For expensive models, the judge provides a “pre-hoc” way to estimate prediction success (without querying the model). For cheap models, we can ask the model to evaluate the answer, and evaluate its confidence (“post-hoc”) in its predictions. Find interesting ways to mix pre- and post-hoc routing to get the best of both worlds.

  4. Multi-level routing: Investigate using a tree of choices rather than one-off routing. What are the pros and cons?

  5. Reducing router training costs: Given a model and a task, how can we cheaply detect a model is not a good fit for a task - avoiding further training time optimizing how bad a fit it is.

  6. Task Decomposition: Model breaking a complex user task into multiple subtasks that can be routed to the most capable models before recombining the results. What are the AI Safety, cyber-security and/or cost implications of this approach?

  7. Universal router: For a set of tasks, create a single router across a set of LLMs that provides higher-quality answers than any single LLM does.

Inferring Judges & Routers

  1. Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.

  2. Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.

  3. Learning when to fail: Sometimes no model will successfully answer a user query. Can we detect when we should fail cheaply?

  4. Learning with uncertain signals: How does judge noise affect the router training process? How does noisy feedback data affect the judge/router training process? Is off-policy data a problem when it comes to training routers?

  5. Risk sensitivity: Rather than optimizing for expected cost/quality, can we optimize for some other risk profile? E.g. we might tolerate a slightly higher cost and lower quality, if we reduce the variance or minimize a long tail.

  6. Create a distilled predictor: The Language Models (Mostly) Know What They Know paper shows that a model can sometimes predict whether it will be able to answer a question correctly. For a selected open “base” model, create a smaller “distilled predictor” that mirrors the base model’s ability to predict answer correctness (but can no longer calculate the answer). You might use the techniques from that paper and/or the pruning and distillation techniques from the movement pruning to shrink the predictor.

168

Sign Ups

0

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

✨ Overview

Join us in pioneering a paradigm shift in AI development! This hackathon will focus on creating the building blocks for a revolutionary Expert Orchestration AI architecture that democratizes, aligns, and enhances large language models through intelligent routing to specialized experts.

The current approach of developing increasingly powerful monolithic AI models faces fundamental limitations in performance, transparency, and democratization. This hackathon aims to develop key components of an alternative architecture where specialized "judge" models evaluate the capabilities of thousands of specialized models, and "router" systems intelligently direct queries to the most appropriate models based on these evaluations.

We’re excited to collaborate with Martian, pioneers in model routing and interpretability research, for this specialized hackathon. Their novel expert orchestration approach distills complex AI models into compact, predictive components that retain only the information needed to estimate performance on specific tasks. By combining this with cutting-edge Mechanistic Interpretability techniques, Martian is developing a general theory of model capabilities—enabling us to evaluate and deploy models more safely, transparently, and effectively. Their work also supports an open ecosystem of specialized models, helping democratize AI development while aligning business impact with AI safety.

Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.

Sign up here to stay updated for this event!

Why This Hackathon Matters:

The current trajectory of AI development, focused on increasingly powerful monolithic models, faces fundamental limitations:

  • Winner-takes-all dynamics: The high costs of training frontier models leads to market concentration and economic power in a few corporations.

  • Misaligned safety incentives: Companies racing to release increasingly powerful models may underreport risks and rush products to market.

  • High barriers to entry: Specialized models struggle to gain market traction against generalist models, even when they excel in specific domains.

  • Limited user control: Users have minimal visibility into how models "think" and limited ability to control characteristics like factuality, bias, or ethical reasoning.

  • Inefficient resource use: Using powerful frontier models for all tasks wastes resources and often underperforms specialized alternatives.

The Expert Orchestration Architecture addresses these issues by creating a more transparent, efficient, and democratic AI ecosystem where specialized models can thrive based on their unique strengths, and users gain unprecedented control over AI capabilities.

Expected Outcomes

Participants will create components that advance the Expert Orchestration vision:

  • Prototype judge models for evaluating specific AI capabilities

  • Intelligent routing algorithms for directing queries to appropriate models

  • Frameworks for decomposing complex tasks across multiple specialized models

  • Integration APIs that allow seamless discovery and utilization of specialized models

  • Evaluation metrics and benchmarks for comparing different routing and judge strategies

The most promising projects will have opportunities for continued development and potential integration into production systems.

Challenge Tracks

Track 1: Judge Model Development

Build specialized evaluation models that can assess different AI models' capabilities across dimensions that matter to users (manipulation skills and tendencies, deception and hidden communication, misaligned goals, factuality, domain expertise, ethics, creativity, objectivity, etc.). Judges should provide independent, objective evaluations that create transparency around model strengths and weaknesses.

Track 2: Intelligent Router Systems

Develop router systems that can intelligently direct user queries to the most appropriate specialized models based on user preferences using judge evaluations. Focus areas include routers that use multiple judges (e.g. factuality, ethics and lack of deception), query classification, efficiency optimization, and handling uncertainty.

Track 3: Task Decomposition Frameworks

Create systems that can break down complex user requests into a series of more manageable steps, to be executed by different specialized models. This includes planning phases, execution phases, and coordination mechanisms. Investigate whether decomposition avoids or reduces some of the traps reported for monolithic reasoning models (e.g. reward hacking).

Track 4: Specialized Model Integration

Build frameworks that enable easy integration of new specialized models into the Expert Orchestration Architecture, including methods for model discovery, capability profiling, and dynamic performance evaluation.


Open Research Questions

Judges

Model Characteristic Analysis: Create dataset(s) that test an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.

  1. Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics. Explore how this impacts AI Safety.

  2. Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges e.g. judges that can evaluate outputs based on how they were generated.

  3. Measuring model similarity: HuggingFace hosts thousands of LLMs. How do we measure whether two models have similar capabilities? How do we choose a subset of these models with a sufficiently diverse set of capabilities, that after training the resulting router will be performant? How does the router performance vary with the size of the subset?

Routers

Given a set of models with known capabilities measured by known judge scores:

  1. Risk-Sensitive Routing: Build efficient routing algorithms balancing the judge scores, computes costs, and system reliability for the best user experience.

  2. Multi-Objective Routing: Create routers that use scores from multiple judges (e.g., answer correctness, ethics and legality) according to user preferences for the best user experience. What are the tradeoffs?

  3. Routing algorithms: For expensive models, the judge provides a “pre-hoc” way to estimate prediction success (without querying the model). For cheap models, we can ask the model to evaluate the answer, and evaluate its confidence (“post-hoc”) in its predictions. Find interesting ways to mix pre- and post-hoc routing to get the best of both worlds.

  4. Multi-level routing: Investigate using a tree of choices rather than one-off routing. What are the pros and cons?

  5. Reducing router training costs: Given a model and a task, how can we cheaply detect a model is not a good fit for a task - avoiding further training time optimizing how bad a fit it is.

  6. Task Decomposition: Model breaking a complex user task into multiple subtasks that can be routed to the most capable models before recombining the results. What are the AI Safety, cyber-security and/or cost implications of this approach?

  7. Universal router: For a set of tasks, create a single router across a set of LLMs that provides higher-quality answers than any single LLM does.

Inferring Judges & Routers

  1. Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.

  2. Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.

  3. Learning when to fail: Sometimes no model will successfully answer a user query. Can we detect when we should fail cheaply?

  4. Learning with uncertain signals: How does judge noise affect the router training process? How does noisy feedback data affect the judge/router training process? Is off-policy data a problem when it comes to training routers?

  5. Risk sensitivity: Rather than optimizing for expected cost/quality, can we optimize for some other risk profile? E.g. we might tolerate a slightly higher cost and lower quality, if we reduce the variance or minimize a long tail.

  6. Create a distilled predictor: The Language Models (Mostly) Know What They Know paper shows that a model can sometimes predict whether it will be able to answer a question correctly. For a selected open “base” model, create a smaller “distilled predictor” that mirrors the base model’s ability to predict answer correctness (but can no longer calculate the answer). You might use the techniques from that paper and/or the pruning and distillation techniques from the movement pruning to shrink the predictor.

168

Sign Ups

0

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

✨ Overview

Join us in pioneering a paradigm shift in AI development! This hackathon will focus on creating the building blocks for a revolutionary Expert Orchestration AI architecture that democratizes, aligns, and enhances large language models through intelligent routing to specialized experts.

The current approach of developing increasingly powerful monolithic AI models faces fundamental limitations in performance, transparency, and democratization. This hackathon aims to develop key components of an alternative architecture where specialized "judge" models evaluate the capabilities of thousands of specialized models, and "router" systems intelligently direct queries to the most appropriate models based on these evaluations.

We’re excited to collaborate with Martian, pioneers in model routing and interpretability research, for this specialized hackathon. Their novel expert orchestration approach distills complex AI models into compact, predictive components that retain only the information needed to estimate performance on specific tasks. By combining this with cutting-edge Mechanistic Interpretability techniques, Martian is developing a general theory of model capabilities—enabling us to evaluate and deploy models more safely, transparently, and effectively. Their work also supports an open ecosystem of specialized models, helping democratize AI development while aligning business impact with AI safety.

Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.

Sign up here to stay updated for this event!

Why This Hackathon Matters:

The current trajectory of AI development, focused on increasingly powerful monolithic models, faces fundamental limitations:

  • Winner-takes-all dynamics: The high costs of training frontier models leads to market concentration and economic power in a few corporations.

  • Misaligned safety incentives: Companies racing to release increasingly powerful models may underreport risks and rush products to market.

  • High barriers to entry: Specialized models struggle to gain market traction against generalist models, even when they excel in specific domains.

  • Limited user control: Users have minimal visibility into how models "think" and limited ability to control characteristics like factuality, bias, or ethical reasoning.

  • Inefficient resource use: Using powerful frontier models for all tasks wastes resources and often underperforms specialized alternatives.

The Expert Orchestration Architecture addresses these issues by creating a more transparent, efficient, and democratic AI ecosystem where specialized models can thrive based on their unique strengths, and users gain unprecedented control over AI capabilities.

Expected Outcomes

Participants will create components that advance the Expert Orchestration vision:

  • Prototype judge models for evaluating specific AI capabilities

  • Intelligent routing algorithms for directing queries to appropriate models

  • Frameworks for decomposing complex tasks across multiple specialized models

  • Integration APIs that allow seamless discovery and utilization of specialized models

  • Evaluation metrics and benchmarks for comparing different routing and judge strategies

The most promising projects will have opportunities for continued development and potential integration into production systems.

Challenge Tracks

Track 1: Judge Model Development

Build specialized evaluation models that can assess different AI models' capabilities across dimensions that matter to users (manipulation skills and tendencies, deception and hidden communication, misaligned goals, factuality, domain expertise, ethics, creativity, objectivity, etc.). Judges should provide independent, objective evaluations that create transparency around model strengths and weaknesses.

Track 2: Intelligent Router Systems

Develop router systems that can intelligently direct user queries to the most appropriate specialized models based on user preferences using judge evaluations. Focus areas include routers that use multiple judges (e.g. factuality, ethics and lack of deception), query classification, efficiency optimization, and handling uncertainty.

Track 3: Task Decomposition Frameworks

Create systems that can break down complex user requests into a series of more manageable steps, to be executed by different specialized models. This includes planning phases, execution phases, and coordination mechanisms. Investigate whether decomposition avoids or reduces some of the traps reported for monolithic reasoning models (e.g. reward hacking).

Track 4: Specialized Model Integration

Build frameworks that enable easy integration of new specialized models into the Expert Orchestration Architecture, including methods for model discovery, capability profiling, and dynamic performance evaluation.


Open Research Questions

Judges

Model Characteristic Analysis: Create dataset(s) that test an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.

  1. Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics. Explore how this impacts AI Safety.

  2. Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges e.g. judges that can evaluate outputs based on how they were generated.

  3. Measuring model similarity: HuggingFace hosts thousands of LLMs. How do we measure whether two models have similar capabilities? How do we choose a subset of these models with a sufficiently diverse set of capabilities, that after training the resulting router will be performant? How does the router performance vary with the size of the subset?

Routers

Given a set of models with known capabilities measured by known judge scores:

  1. Risk-Sensitive Routing: Build efficient routing algorithms balancing the judge scores, computes costs, and system reliability for the best user experience.

  2. Multi-Objective Routing: Create routers that use scores from multiple judges (e.g., answer correctness, ethics and legality) according to user preferences for the best user experience. What are the tradeoffs?

  3. Routing algorithms: For expensive models, the judge provides a “pre-hoc” way to estimate prediction success (without querying the model). For cheap models, we can ask the model to evaluate the answer, and evaluate its confidence (“post-hoc”) in its predictions. Find interesting ways to mix pre- and post-hoc routing to get the best of both worlds.

  4. Multi-level routing: Investigate using a tree of choices rather than one-off routing. What are the pros and cons?

  5. Reducing router training costs: Given a model and a task, how can we cheaply detect a model is not a good fit for a task - avoiding further training time optimizing how bad a fit it is.

  6. Task Decomposition: Model breaking a complex user task into multiple subtasks that can be routed to the most capable models before recombining the results. What are the AI Safety, cyber-security and/or cost implications of this approach?

  7. Universal router: For a set of tasks, create a single router across a set of LLMs that provides higher-quality answers than any single LLM does.

Inferring Judges & Routers

  1. Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.

  2. Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.

  3. Learning when to fail: Sometimes no model will successfully answer a user query. Can we detect when we should fail cheaply?

  4. Learning with uncertain signals: How does judge noise affect the router training process? How does noisy feedback data affect the judge/router training process? Is off-policy data a problem when it comes to training routers?

  5. Risk sensitivity: Rather than optimizing for expected cost/quality, can we optimize for some other risk profile? E.g. we might tolerate a slightly higher cost and lower quality, if we reduce the variance or minimize a long tail.

  6. Create a distilled predictor: The Language Models (Mostly) Know What They Know paper shows that a model can sometimes predict whether it will be able to answer a question correctly. For a selected open “base” model, create a smaller “distilled predictor” that mirrors the base model’s ability to predict answer correctness (but can no longer calculate the answer). You might use the techniques from that paper and/or the pruning and distillation techniques from the movement pruning to shrink the predictor.

168

Sign Ups

0

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

✨ Overview

Join us in pioneering a paradigm shift in AI development! This hackathon will focus on creating the building blocks for a revolutionary Expert Orchestration AI architecture that democratizes, aligns, and enhances large language models through intelligent routing to specialized experts.

The current approach of developing increasingly powerful monolithic AI models faces fundamental limitations in performance, transparency, and democratization. This hackathon aims to develop key components of an alternative architecture where specialized "judge" models evaluate the capabilities of thousands of specialized models, and "router" systems intelligently direct queries to the most appropriate models based on these evaluations.

We’re excited to collaborate with Martian, pioneers in model routing and interpretability research, for this specialized hackathon. Their novel expert orchestration approach distills complex AI models into compact, predictive components that retain only the information needed to estimate performance on specific tasks. By combining this with cutting-edge Mechanistic Interpretability techniques, Martian is developing a general theory of model capabilities—enabling us to evaluate and deploy models more safely, transparently, and effectively. Their work also supports an open ecosystem of specialized models, helping democratize AI development while aligning business impact with AI safety.

Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.

Sign up here to stay updated for this event!

Why This Hackathon Matters:

The current trajectory of AI development, focused on increasingly powerful monolithic models, faces fundamental limitations:

  • Winner-takes-all dynamics: The high costs of training frontier models leads to market concentration and economic power in a few corporations.

  • Misaligned safety incentives: Companies racing to release increasingly powerful models may underreport risks and rush products to market.

  • High barriers to entry: Specialized models struggle to gain market traction against generalist models, even when they excel in specific domains.

  • Limited user control: Users have minimal visibility into how models "think" and limited ability to control characteristics like factuality, bias, or ethical reasoning.

  • Inefficient resource use: Using powerful frontier models for all tasks wastes resources and often underperforms specialized alternatives.

The Expert Orchestration Architecture addresses these issues by creating a more transparent, efficient, and democratic AI ecosystem where specialized models can thrive based on their unique strengths, and users gain unprecedented control over AI capabilities.

Expected Outcomes

Participants will create components that advance the Expert Orchestration vision:

  • Prototype judge models for evaluating specific AI capabilities

  • Intelligent routing algorithms for directing queries to appropriate models

  • Frameworks for decomposing complex tasks across multiple specialized models

  • Integration APIs that allow seamless discovery and utilization of specialized models

  • Evaluation metrics and benchmarks for comparing different routing and judge strategies

The most promising projects will have opportunities for continued development and potential integration into production systems.

Challenge Tracks

Track 1: Judge Model Development

Build specialized evaluation models that can assess different AI models' capabilities across dimensions that matter to users (manipulation skills and tendencies, deception and hidden communication, misaligned goals, factuality, domain expertise, ethics, creativity, objectivity, etc.). Judges should provide independent, objective evaluations that create transparency around model strengths and weaknesses.

Track 2: Intelligent Router Systems

Develop router systems that can intelligently direct user queries to the most appropriate specialized models based on user preferences using judge evaluations. Focus areas include routers that use multiple judges (e.g. factuality, ethics and lack of deception), query classification, efficiency optimization, and handling uncertainty.

Track 3: Task Decomposition Frameworks

Create systems that can break down complex user requests into a series of more manageable steps, to be executed by different specialized models. This includes planning phases, execution phases, and coordination mechanisms. Investigate whether decomposition avoids or reduces some of the traps reported for monolithic reasoning models (e.g. reward hacking).

Track 4: Specialized Model Integration

Build frameworks that enable easy integration of new specialized models into the Expert Orchestration Architecture, including methods for model discovery, capability profiling, and dynamic performance evaluation.


Open Research Questions

Judges

Model Characteristic Analysis: Create dataset(s) that test an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.

  1. Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics. Explore how this impacts AI Safety.

  2. Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges e.g. judges that can evaluate outputs based on how they were generated.

  3. Measuring model similarity: HuggingFace hosts thousands of LLMs. How do we measure whether two models have similar capabilities? How do we choose a subset of these models with a sufficiently diverse set of capabilities, that after training the resulting router will be performant? How does the router performance vary with the size of the subset?

Routers

Given a set of models with known capabilities measured by known judge scores:

  1. Risk-Sensitive Routing: Build efficient routing algorithms balancing the judge scores, computes costs, and system reliability for the best user experience.

  2. Multi-Objective Routing: Create routers that use scores from multiple judges (e.g., answer correctness, ethics and legality) according to user preferences for the best user experience. What are the tradeoffs?

  3. Routing algorithms: For expensive models, the judge provides a “pre-hoc” way to estimate prediction success (without querying the model). For cheap models, we can ask the model to evaluate the answer, and evaluate its confidence (“post-hoc”) in its predictions. Find interesting ways to mix pre- and post-hoc routing to get the best of both worlds.

  4. Multi-level routing: Investigate using a tree of choices rather than one-off routing. What are the pros and cons?

  5. Reducing router training costs: Given a model and a task, how can we cheaply detect a model is not a good fit for a task - avoiding further training time optimizing how bad a fit it is.

  6. Task Decomposition: Model breaking a complex user task into multiple subtasks that can be routed to the most capable models before recombining the results. What are the AI Safety, cyber-security and/or cost implications of this approach?

  7. Universal router: For a set of tasks, create a single router across a set of LLMs that provides higher-quality answers than any single LLM does.

Inferring Judges & Routers

  1. Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.

  2. Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.

  3. Learning when to fail: Sometimes no model will successfully answer a user query. Can we detect when we should fail cheaply?

  4. Learning with uncertain signals: How does judge noise affect the router training process? How does noisy feedback data affect the judge/router training process? Is off-policy data a problem when it comes to training routers?

  5. Risk sensitivity: Rather than optimizing for expected cost/quality, can we optimize for some other risk profile? E.g. we might tolerate a slightly higher cost and lower quality, if we reduce the variance or minimize a long tail.

  6. Create a distilled predictor: The Language Models (Mostly) Know What They Know paper shows that a model can sometimes predict whether it will be able to answer a question correctly. For a selected open “base” model, create a smaller “distilled predictor” that mirrors the base model’s ability to predict answer correctness (but can no longer calculate the answer). You might use the techniques from that paper and/or the pruning and distillation techniques from the movement pruning to shrink the predictor.

Speakers & Collaborators

Philip Quirke

Organiser

Pivoted to AI Safety in 2023, after roles as a Software Engineer & Architect, Business Analyst, Project Manager, Product Manager, General Manager, etc. Philip's AI journey started with an Apart Reserarch Hackathon, which led to research grants, a stint at FAR AI and finally landed at Martian!

Yash Upadhyay

Organiser

Yash is the Co-Founder and Co-CEO of Martian, where he leads the company's mission to enhance AI performance and reliability through innovative model routing solutions. With a background in AI research and development, Yash has been instrumental in building tools that optimize the use of large language models, ensuring efficiency and cost-effectiveness for enterprise applications.

Etan Ginsberg

Organiser

Etan is a Co-Founder and Co-CEO of Martian, where he focuses on applying advanced AI infrastructure to help companies use large language models more effectively. His experience includes deep technical leadership and a track record of building high-performance systems. Etan's work at Martian is centered on making LLMs more reliable, affordable, and performant for enterprise use.

Chaitanya Bandi

Organiser

Chaitanya is the VP of Research at Martian, focusing on AI alignment and model interpretability. He has contributed to the development of model mapping techniques that transform opaque neural networks into transparent, verifiable programs, enhancing model efficiency and human-AI interaction . Chaitanya holds a Ph.D. from MIT and has a background in decision-making under uncertainty, with applications in operations management .

Ashley Zhang

Organiser

Ashely is a backend engineer at Penn Labs and an Engineer at Martian

Luka Samkharadze

Organiser

Currently Founding Software Engineer with rich hands-on experience and diverse portfolio of projects. Luka is currently a Founding Engineer at Stable and previously was at Martian

Dory Zidon

Organiser

Dory is a key member of the Martian back-end team, contributing to the company's products, infrastructure and performance.

Josh Greaves

ML Tech Lead

Josh is the Machine Learning Tech Lead at Martian, where he focuses on reinforcement learning and large language models. His prior experience includes roles at Google Brain and Reliant AI.

Antía García Casal

Organiser

Currently the Head of Design at Martian. Previously a Visual Designer Freelance with over 15 years of Experience

Alex Zverianskii

Organiser

Over past 15 years, Alex has been in businesses of diverse sizes, ranging from 200k MAU to 100mln MAU. Alex has engineered hundreds of real-time models, primed an analytics and data for an IPO, and built three startups from the ground up, with one successful exit.

Brad Fowler

Organiser

Brad is a Machine Learning Research Technical Lead at Martian. He holds a Master's degree in Information and Computer Engineering from the University of Cambridge and has over seven years of experience in artificial intelligence and software development.

Narmeen Oozeer

Organiser

Narmeen Oozeer is a Research Engineer focused on AI/ML interpretability at Martian. Her work centers on developing scalable interpretability methods to build better and more interpretable LLM routers. Narmeen has previously worked on activation transfers, allowing alignment interventions to be transferred between models of different scales.

Jason Schreiber

Organizer and Judge

Jason is co-director of Apart Research and leads Apart Lab, our remote-first AI safety research fellowship.

Speakers & Collaborators

Philip Quirke

Organiser

Pivoted to AI Safety in 2023, after roles as a Software Engineer & Architect, Business Analyst, Project Manager, Product Manager, General Manager, etc. Philip's AI journey started with an Apart Reserarch Hackathon, which led to research grants, a stint at FAR AI and finally landed at Martian!

Yash Upadhyay

Organiser

Yash is the Co-Founder and Co-CEO of Martian, where he leads the company's mission to enhance AI performance and reliability through innovative model routing solutions. With a background in AI research and development, Yash has been instrumental in building tools that optimize the use of large language models, ensuring efficiency and cost-effectiveness for enterprise applications.

Etan Ginsberg

Organiser

Etan is a Co-Founder and Co-CEO of Martian, where he focuses on applying advanced AI infrastructure to help companies use large language models more effectively. His experience includes deep technical leadership and a track record of building high-performance systems. Etan's work at Martian is centered on making LLMs more reliable, affordable, and performant for enterprise use.

Chaitanya Bandi

Organiser

Chaitanya is the VP of Research at Martian, focusing on AI alignment and model interpretability. He has contributed to the development of model mapping techniques that transform opaque neural networks into transparent, verifiable programs, enhancing model efficiency and human-AI interaction . Chaitanya holds a Ph.D. from MIT and has a background in decision-making under uncertainty, with applications in operations management .

Ashley Zhang

Organiser

Ashely is a backend engineer at Penn Labs and an Engineer at Martian

Luka Samkharadze

Organiser

Currently Founding Software Engineer with rich hands-on experience and diverse portfolio of projects. Luka is currently a Founding Engineer at Stable and previously was at Martian

Dory Zidon

Organiser

Dory is a key member of the Martian back-end team, contributing to the company's products, infrastructure and performance.

Josh Greaves

ML Tech Lead

Josh is the Machine Learning Tech Lead at Martian, where he focuses on reinforcement learning and large language models. His prior experience includes roles at Google Brain and Reliant AI.

Antía García Casal

Organiser

Currently the Head of Design at Martian. Previously a Visual Designer Freelance with over 15 years of Experience

Alex Zverianskii

Organiser

Over past 15 years, Alex has been in businesses of diverse sizes, ranging from 200k MAU to 100mln MAU. Alex has engineered hundreds of real-time models, primed an analytics and data for an IPO, and built three startups from the ground up, with one successful exit.

Brad Fowler

Organiser

Brad is a Machine Learning Research Technical Lead at Martian. He holds a Master's degree in Information and Computer Engineering from the University of Cambridge and has over seven years of experience in artificial intelligence and software development.

Narmeen Oozeer

Organiser

Narmeen Oozeer is a Research Engineer focused on AI/ML interpretability at Martian. Her work centers on developing scalable interpretability methods to build better and more interpretable LLM routers. Narmeen has previously worked on activation transfers, allowing alignment interventions to be transferred between models of different scales.

Jason Schreiber

Organizer and Judge

Jason is co-director of Apart Research and leads Apart Lab, our remote-first AI safety research fellowship.

Registered Jam Sites

Register A Location

Register A Location

Register A Location

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.

Registered Jam Sites

Register A Location

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.