May 30, 2025
-
Jun 1, 2025
Online & In-Person
Beyond Single Models: The Martian Routing Hackathon




Join Martian and Apart Research for a weekend of innovation building judges and routers that secure AI systems
22 : 04 : 55 : 02
22 : 04 : 55 : 02
22 : 04 : 55 : 02
22 : 04 : 55 : 02
Join Martian and Apart Research for a weekend of innovation building judges and routers that secure AI systems
This event is ongoing.
This event has concluded.
Sign Ups
Entries
Overview
Resources
Guidelines
Entries
Overview

✨ Overview
Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.
Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate how well different models perform on specific tasks—not just for correctness, but also across dimensions users increasingly care about, such as ethics, legality, truthfulness, and latency. Routers then direct each query to the most suitable model, aggregating the strengths of many to consistently outperform any single system. This architecture delivers higher-quality, lower-cost responses, while also incentivizing deeper model understanding—since better interpretability leads to better routing.
We’re excited to collaborate with Martian, pioneers in model routing and interpretability research, for this specialized hackathon. Their novel "model mapping" approach distills complex AI models into compact, predictive components that retain only the information needed to estimate performance on specific tasks. By combining this with cutting-edge Mechanistic Interpretability techniques, Martian is developing a general theory of model capabilities—enabling us to evaluate and deploy models more safely, transparently, and effectively. Their work also supports an open ecosystem of specialized models, helping democratize AI development while aligning business impact with AI safety.
Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.
Sign up here to stay updated for this event
Why This Hackathon Matters:
The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:
Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results
Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive
Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment
Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations
Challenge Tracks
Creating Judges Track
Model Characteristic Analysis: Create dataset(s) that test an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.
Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics. Explore how this impacts AI Safety.
Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges e.g. judges that can evaluate outputs based on how they were generated.
Measuring model similarity: HuggingFace hosts thousands of LLMs. How do we measure whether two models have similar capabilities? How do we choose a subset of these models with a sufficiently diverse set of capabilities, that after training the resulting router will be performant? How does the router performance vary with the size of the subset?
Creating Routers Track
Given a set of models with known capabilities measured by known judge scores:
Risk-Sensitive Routing: Build efficient routing algorithms balancing the judge scores, computes costs, and system reliability for the best user experience.
Multi-Objective Routing: Create routers that use scores from multiple judges (e.g., answer correctness, ethics and legality) according to user preferences for the best user experience. What are the tradeoffs?
Routing algorithms: For expensive models, the judge provides a “pre-hoc” way to estimate prediction success (without querying the model). For cheap models, we can ask the model to evaluate the answer, and evaluate its confidence (“post-hoc”) in its predictions. Find interesting ways to mix pre- and post-hoc routing to get the best of both worlds.
Multi-level routing: Investigate using a tree of choices rather than one-off routing. What are the pros and cons?
Reducing router training costs: Given a model and a task, how can we cheaply detect a model is not a good fit for a task - avoiding further training time optimizing how bad a fit it is.
Task Decomposition: Model breaking a complex user task into multiple subtasks that can be routed to the most capable models before recombining the results. What are the AI Safety, cyber-security and/or cost implications of this approach?
Universal router: For a set of tasks, create a single router across a set of LLMs that provides higher-quality answers than any single LLM does.
Inferring Judges & Routers Track
Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.
Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.
Learning when to fail: Sometimes no model will successfully answer a user query. Can we detect when we should fail cheaply?
Learning with uncertain signals: How does judge noise affect the router training process? How does noisy feedback data affect the judge/router training process? Is off-policy data a problem when it comes to training routers?
Risk sensitivity: Rather than optimizing for expected cost/quality, can we optimize for some other risk profile? E.g. we might tolerate a slightly higher cost and lower quality, if we reduce the variance or minimize a long tail.
Create a distilled predictor: The Language Models (Mostly) Know What They Know paper shows that a model can sometimes predict whether it will be able to answer a question correctly. For a selected open “base” model, create a smaller “distilled predictor” that mirrors the base model’s ability to predict answer correctness (but can no longer calculate the answer). You might use the techniques from that paper and/or the pruning and distillation techniques from the movement pruning to shrink the predictor.
Sign Ups
Entries
Overview
Resources
Guidelines
Entries
Overview

✨ Overview
Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.
Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate how well different models perform on specific tasks—not just for correctness, but also across dimensions users increasingly care about, such as ethics, legality, truthfulness, and latency. Routers then direct each query to the most suitable model, aggregating the strengths of many to consistently outperform any single system. This architecture delivers higher-quality, lower-cost responses, while also incentivizing deeper model understanding—since better interpretability leads to better routing.
We’re excited to collaborate with Martian, pioneers in model routing and interpretability research, for this specialized hackathon. Their novel "model mapping" approach distills complex AI models into compact, predictive components that retain only the information needed to estimate performance on specific tasks. By combining this with cutting-edge Mechanistic Interpretability techniques, Martian is developing a general theory of model capabilities—enabling us to evaluate and deploy models more safely, transparently, and effectively. Their work also supports an open ecosystem of specialized models, helping democratize AI development while aligning business impact with AI safety.
Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.
Sign up here to stay updated for this event
Why This Hackathon Matters:
The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:
Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results
Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive
Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment
Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations
Challenge Tracks
Creating Judges Track
Model Characteristic Analysis: Create dataset(s) that test an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.
Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics. Explore how this impacts AI Safety.
Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges e.g. judges that can evaluate outputs based on how they were generated.
Measuring model similarity: HuggingFace hosts thousands of LLMs. How do we measure whether two models have similar capabilities? How do we choose a subset of these models with a sufficiently diverse set of capabilities, that after training the resulting router will be performant? How does the router performance vary with the size of the subset?
Creating Routers Track
Given a set of models with known capabilities measured by known judge scores:
Risk-Sensitive Routing: Build efficient routing algorithms balancing the judge scores, computes costs, and system reliability for the best user experience.
Multi-Objective Routing: Create routers that use scores from multiple judges (e.g., answer correctness, ethics and legality) according to user preferences for the best user experience. What are the tradeoffs?
Routing algorithms: For expensive models, the judge provides a “pre-hoc” way to estimate prediction success (without querying the model). For cheap models, we can ask the model to evaluate the answer, and evaluate its confidence (“post-hoc”) in its predictions. Find interesting ways to mix pre- and post-hoc routing to get the best of both worlds.
Multi-level routing: Investigate using a tree of choices rather than one-off routing. What are the pros and cons?
Reducing router training costs: Given a model and a task, how can we cheaply detect a model is not a good fit for a task - avoiding further training time optimizing how bad a fit it is.
Task Decomposition: Model breaking a complex user task into multiple subtasks that can be routed to the most capable models before recombining the results. What are the AI Safety, cyber-security and/or cost implications of this approach?
Universal router: For a set of tasks, create a single router across a set of LLMs that provides higher-quality answers than any single LLM does.
Inferring Judges & Routers Track
Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.
Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.
Learning when to fail: Sometimes no model will successfully answer a user query. Can we detect when we should fail cheaply?
Learning with uncertain signals: How does judge noise affect the router training process? How does noisy feedback data affect the judge/router training process? Is off-policy data a problem when it comes to training routers?
Risk sensitivity: Rather than optimizing for expected cost/quality, can we optimize for some other risk profile? E.g. we might tolerate a slightly higher cost and lower quality, if we reduce the variance or minimize a long tail.
Create a distilled predictor: The Language Models (Mostly) Know What They Know paper shows that a model can sometimes predict whether it will be able to answer a question correctly. For a selected open “base” model, create a smaller “distilled predictor” that mirrors the base model’s ability to predict answer correctness (but can no longer calculate the answer). You might use the techniques from that paper and/or the pruning and distillation techniques from the movement pruning to shrink the predictor.
Sign Ups
Entries
Overview
Resources
Guidelines
Entries
Overview

✨ Overview
Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.
Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate how well different models perform on specific tasks—not just for correctness, but also across dimensions users increasingly care about, such as ethics, legality, truthfulness, and latency. Routers then direct each query to the most suitable model, aggregating the strengths of many to consistently outperform any single system. This architecture delivers higher-quality, lower-cost responses, while also incentivizing deeper model understanding—since better interpretability leads to better routing.
We’re excited to collaborate with Martian, pioneers in model routing and interpretability research, for this specialized hackathon. Their novel "model mapping" approach distills complex AI models into compact, predictive components that retain only the information needed to estimate performance on specific tasks. By combining this with cutting-edge Mechanistic Interpretability techniques, Martian is developing a general theory of model capabilities—enabling us to evaluate and deploy models more safely, transparently, and effectively. Their work also supports an open ecosystem of specialized models, helping democratize AI development while aligning business impact with AI safety.
Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.
Sign up here to stay updated for this event
Why This Hackathon Matters:
The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:
Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results
Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive
Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment
Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations
Challenge Tracks
Creating Judges Track
Model Characteristic Analysis: Create dataset(s) that test an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.
Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics. Explore how this impacts AI Safety.
Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges e.g. judges that can evaluate outputs based on how they were generated.
Measuring model similarity: HuggingFace hosts thousands of LLMs. How do we measure whether two models have similar capabilities? How do we choose a subset of these models with a sufficiently diverse set of capabilities, that after training the resulting router will be performant? How does the router performance vary with the size of the subset?
Creating Routers Track
Given a set of models with known capabilities measured by known judge scores:
Risk-Sensitive Routing: Build efficient routing algorithms balancing the judge scores, computes costs, and system reliability for the best user experience.
Multi-Objective Routing: Create routers that use scores from multiple judges (e.g., answer correctness, ethics and legality) according to user preferences for the best user experience. What are the tradeoffs?
Routing algorithms: For expensive models, the judge provides a “pre-hoc” way to estimate prediction success (without querying the model). For cheap models, we can ask the model to evaluate the answer, and evaluate its confidence (“post-hoc”) in its predictions. Find interesting ways to mix pre- and post-hoc routing to get the best of both worlds.
Multi-level routing: Investigate using a tree of choices rather than one-off routing. What are the pros and cons?
Reducing router training costs: Given a model and a task, how can we cheaply detect a model is not a good fit for a task - avoiding further training time optimizing how bad a fit it is.
Task Decomposition: Model breaking a complex user task into multiple subtasks that can be routed to the most capable models before recombining the results. What are the AI Safety, cyber-security and/or cost implications of this approach?
Universal router: For a set of tasks, create a single router across a set of LLMs that provides higher-quality answers than any single LLM does.
Inferring Judges & Routers Track
Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.
Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.
Learning when to fail: Sometimes no model will successfully answer a user query. Can we detect when we should fail cheaply?
Learning with uncertain signals: How does judge noise affect the router training process? How does noisy feedback data affect the judge/router training process? Is off-policy data a problem when it comes to training routers?
Risk sensitivity: Rather than optimizing for expected cost/quality, can we optimize for some other risk profile? E.g. we might tolerate a slightly higher cost and lower quality, if we reduce the variance or minimize a long tail.
Create a distilled predictor: The Language Models (Mostly) Know What They Know paper shows that a model can sometimes predict whether it will be able to answer a question correctly. For a selected open “base” model, create a smaller “distilled predictor” that mirrors the base model’s ability to predict answer correctness (but can no longer calculate the answer). You might use the techniques from that paper and/or the pruning and distillation techniques from the movement pruning to shrink the predictor.
Sign Ups
Entries
Overview
Resources
Guidelines
Entries
Overview

✨ Overview
Shape the future of AI systems through smarter model routing and evaluation! Join us for a groundbreaking hackathon focused on creating judges that evaluate model outputs and routers that direct queries to the optimal models. Whether you're a researcher, developer, or ML enthusiast, this hackathon offers a unique opportunity to tackle crucial challenges in AI optimization and safety.
Judges and routers play a crucial role in AI systems by optimizing decision-making and task delegation. Judges evaluate how well different models perform on specific tasks—not just for correctness, but also across dimensions users increasingly care about, such as ethics, legality, truthfulness, and latency. Routers then direct each query to the most suitable model, aggregating the strengths of many to consistently outperform any single system. This architecture delivers higher-quality, lower-cost responses, while also incentivizing deeper model understanding—since better interpretability leads to better routing.
We’re excited to collaborate with Martian, pioneers in model routing and interpretability research, for this specialized hackathon. Their novel "model mapping" approach distills complex AI models into compact, predictive components that retain only the information needed to estimate performance on specific tasks. By combining this with cutting-edge Mechanistic Interpretability techniques, Martian is developing a general theory of model capabilities—enabling us to evaluate and deploy models more safely, transparently, and effectively. Their work also supports an open ecosystem of specialized models, helping democratize AI development while aligning business impact with AI safety.
Each participating team will receive $50 in model API credits to power their projects, with additional credits available for promising implementations. You'll have access to Martian's judge and router APIs, along with sample code libraries to kickstart your projects.
Sign up here to stay updated for this event
Why This Hackathon Matters:
The Model Mapping Hackathon addresses fundamental challenges in AI deployment as models continue to proliferate:
Efficiency & Cost Reduction: By routing queries to the most appropriate models, we can dramatically reduce computational costs while improving results
Democratizing AI Creation: Specialized models excel in narrow domains but struggle with universal tasks - router technology enables these specialized models to thrive
Safety & Reliability: Better evaluation mechanisms (judges) help ensure outputs meet safety, ethical, and quality standards before deployment
Understanding Model Capabilities: Developing judges requires deeper mechanistic understanding of how models work and their inherent limitations
Challenge Tracks
Creating Judges Track
Model Characteristic Analysis: Create dataset(s) that test an interesting model characteristic pertinent to safety (e.g., ethics, hallucinations, gender bias). Build a judge using this data and evaluate multiple models.
Judge Evaluation Metrics: Develop methods to measure judge accuracy, completeness, and reliability for specific characteristics. Explore how this impacts AI Safety.
Mechanistic Interpretability for Judges: Apply MI techniques to model internals to create better or more interpretable judges e.g. judges that can evaluate outputs based on how they were generated.
Measuring model similarity: HuggingFace hosts thousands of LLMs. How do we measure whether two models have similar capabilities? How do we choose a subset of these models with a sufficiently diverse set of capabilities, that after training the resulting router will be performant? How does the router performance vary with the size of the subset?
Creating Routers Track
Given a set of models with known capabilities measured by known judge scores:
Risk-Sensitive Routing: Build efficient routing algorithms balancing the judge scores, computes costs, and system reliability for the best user experience.
Multi-Objective Routing: Create routers that use scores from multiple judges (e.g., answer correctness, ethics and legality) according to user preferences for the best user experience. What are the tradeoffs?
Routing algorithms: For expensive models, the judge provides a “pre-hoc” way to estimate prediction success (without querying the model). For cheap models, we can ask the model to evaluate the answer, and evaluate its confidence (“post-hoc”) in its predictions. Find interesting ways to mix pre- and post-hoc routing to get the best of both worlds.
Multi-level routing: Investigate using a tree of choices rather than one-off routing. What are the pros and cons?
Reducing router training costs: Given a model and a task, how can we cheaply detect a model is not a good fit for a task - avoiding further training time optimizing how bad a fit it is.
Task Decomposition: Model breaking a complex user task into multiple subtasks that can be routed to the most capable models before recombining the results. What are the AI Safety, cyber-security and/or cost implications of this approach?
Universal router: For a set of tasks, create a single router across a set of LLMs that provides higher-quality answers than any single LLM does.
Inferring Judges & Routers Track
Reverse Engineering: Given a black-box LLM or router, infer its embedded judge (reward signal) for specific characteristics.
Efficiency Analysis: Quantify potential electricity/resource consumption reduction from widespread adoption of optimal routing technologies.
Learning when to fail: Sometimes no model will successfully answer a user query. Can we detect when we should fail cheaply?
Learning with uncertain signals: How does judge noise affect the router training process? How does noisy feedback data affect the judge/router training process? Is off-policy data a problem when it comes to training routers?
Risk sensitivity: Rather than optimizing for expected cost/quality, can we optimize for some other risk profile? E.g. we might tolerate a slightly higher cost and lower quality, if we reduce the variance or minimize a long tail.
Create a distilled predictor: The Language Models (Mostly) Know What They Know paper shows that a model can sometimes predict whether it will be able to answer a question correctly. For a selected open “base” model, create a smaller “distilled predictor” that mirrors the base model’s ability to predict answer correctness (but can no longer calculate the answer). You might use the techniques from that paper and/or the pruning and distillation techniques from the movement pruning to shrink the predictor.
Speakers & Collaborators
Jason Schreiber
Organizer and Judge
Jason is co-director of Apart Research and leads Apart Lab, our remote-first AI safety research fellowship.
Yash Upadhyay
Organiser
Yash is the Co-Founder and Co-CEO of Martian, where he leads the company's mission to enhance AI performance and reliability through innovative model routing solutions. With a background in AI research and development, Yash has been instrumental in building tools that optimize the use of large language models, ensuring efficiency and cost-effectiveness for enterprise applications.
Etan Ginsberg
Organiser
Etan is a Co-Founder and Co-CEO of Martian, where he focuses on applying advanced AI infrastructure to help companies use large language models more effectively. His experience includes deep technical leadership and a track record of building high-performance systems. Etan's work at Martian is centered on making LLMs more reliable, affordable, and performant for enterprise use.
Chaitanya Bandi
Organiser
Chaitanya is the VP of Research at Martian, focusing on AI alignment and model interpretability. He has contributed to the development of model mapping techniques that transform opaque neural networks into transparent, verifiable programs, enhancing model efficiency and human-AI interaction . Chaitanya holds a Ph.D. from MIT and has a background in decision-making under uncertainty, with applications in operations management .
Luka Samkharadze
Organiser
Currently Founding Software Engineer with rich hands-on experience and diverse portfolio of projects. Luka is currently a Founding Engineer at Stable and previously was at Martian
Dory Zidon
Organiser
Dory is a key member of the Martian back-end team, contributing to the company's products, infrastructure and performance.
Josh Greaves
ML Tech Lead
Josh is the Machine Learning Tech Lead at Martian, where he focuses on reinforcement learning and large language models. His prior experience includes roles at Google Brain and Reliant AI.
Philip Quirke
Organiser
Pivoted to AI Safety in 2023, after roles as a Software Engineer & Architect, Business Analyst, Project Manager, Product Manager, General Manager, etc. Philip's AI journey started with an Apart Reserarch Hackathon, which led to research grants, a stint at FAR AI and finally landed at Martian!
Antía García Casal
Organiser
Currently the Head of Design at Martian. Previously a Visual Designer Freelance with over 15 years of Experience
Alex Zverianskii
Organiser
Over past 15 years, Alex has been in businesses of diverse sizes, ranging from 200k MAU to 100mln MAU. Alex has engineered hundreds of real-time models, primed an analytics and data for an IPO, and built three startups from the ground up, with one successful exit.
Brad Fowler
Organiser
Brad is a Machine Learning Research Technical Lead at Martian. He holds a Master's degree in Information and Computer Engineering from the University of Cambridge and has over seven years of experience in artificial intelligence and software development.
Narmeen Oozeer
Organiser
Narmeen Oozeer is a Research Engineer focused on AI/ML interpretability at Martian. Her work centers on developing scalable interpretability methods to build better and more interpretable LLM routers. Narmeen has previously worked on activation transfers, allowing alignment interventions to be transferred between models of different scales.
Speakers & Collaborators

Jason Schreiber
Organizer and Judge
Jason is co-director of Apart Research and leads Apart Lab, our remote-first AI safety research fellowship.

Yash Upadhyay
Organiser
Yash is the Co-Founder and Co-CEO of Martian, where he leads the company's mission to enhance AI performance and reliability through innovative model routing solutions. With a background in AI research and development, Yash has been instrumental in building tools that optimize the use of large language models, ensuring efficiency and cost-effectiveness for enterprise applications.

Etan Ginsberg
Organiser
Etan is a Co-Founder and Co-CEO of Martian, where he focuses on applying advanced AI infrastructure to help companies use large language models more effectively. His experience includes deep technical leadership and a track record of building high-performance systems. Etan's work at Martian is centered on making LLMs more reliable, affordable, and performant for enterprise use.

Chaitanya Bandi
Organiser
Chaitanya is the VP of Research at Martian, focusing on AI alignment and model interpretability. He has contributed to the development of model mapping techniques that transform opaque neural networks into transparent, verifiable programs, enhancing model efficiency and human-AI interaction . Chaitanya holds a Ph.D. from MIT and has a background in decision-making under uncertainty, with applications in operations management .

Luka Samkharadze
Organiser
Currently Founding Software Engineer with rich hands-on experience and diverse portfolio of projects. Luka is currently a Founding Engineer at Stable and previously was at Martian

Dory Zidon
Organiser
Dory is a key member of the Martian back-end team, contributing to the company's products, infrastructure and performance.

Josh Greaves
ML Tech Lead
Josh is the Machine Learning Tech Lead at Martian, where he focuses on reinforcement learning and large language models. His prior experience includes roles at Google Brain and Reliant AI.

Philip Quirke
Organiser
Pivoted to AI Safety in 2023, after roles as a Software Engineer & Architect, Business Analyst, Project Manager, Product Manager, General Manager, etc. Philip's AI journey started with an Apart Reserarch Hackathon, which led to research grants, a stint at FAR AI and finally landed at Martian!

Antía García Casal
Organiser
Currently the Head of Design at Martian. Previously a Visual Designer Freelance with over 15 years of Experience

Alex Zverianskii
Organiser
Over past 15 years, Alex has been in businesses of diverse sizes, ranging from 200k MAU to 100mln MAU. Alex has engineered hundreds of real-time models, primed an analytics and data for an IPO, and built three startups from the ground up, with one successful exit.

Brad Fowler
Organiser
Brad is a Machine Learning Research Technical Lead at Martian. He holds a Master's degree in Information and Computer Engineering from the University of Cambridge and has over seven years of experience in artificial intelligence and software development.

Narmeen Oozeer
Organiser
Narmeen Oozeer is a Research Engineer focused on AI/ML interpretability at Martian. Her work centers on developing scalable interpretability methods to build better and more interpretable LLM routers. Narmeen has previously worked on activation transfers, allowing alignment interventions to be transferred between models of different scales.
Registered Jam Sites
Register A Location
Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.
The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.
We haven't announced jam sites yet
Check back later
Registered Jam Sites
Register A Location
Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.
The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.
We haven't announced jam sites yet
Check back later
Our Other Sprints
Apr 25, 2025
-
Apr 27, 2025
Research
Economics of Transformative AI
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible
Sign Up
Sign Up
Sign Up
Apr 14, 2025
-
Apr 26, 2025
Research
Berkeley AI Policy Hackathon
This unique event brings together diverse perspectives to tackle crucial challenges in AI alignment, governance, and safety. Work alongside leading experts, develop innovative solutions, and help shape the future of responsible
Sign Up
Sign Up
Sign Up

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events