Jun 1, 2025
Judge using SAE Features
Dhruv Yadav
Summary
The key idea of this project was to explore model judgement using
Sparse Autoencoder (SAE) features for mathematical reasoning
tasks involving addition, multiplication, and subtraction operations.
We compared this performance against basic Chain-of-Thought
(CoT) based judgement using routing algorithms deployed by the
Martian SDK.
Our work addresses the Judge Model Development track by
developing a SAE feature-based routing system that achieves
comparable performance to traditional CoT approaches while
providing enhanced interpretability. We demonstrate that SAE
feature analysis can effectively guide model selection for
mathematical reasoning tasks, with our system showing 91%
routing accuracy compared to 98% for traditional CoT methods.
Critically, our SAE-based approach identified 7% of cases where
both models were inadequate based on feature analysis - insights
invisible to traditional routing methods.
This work advances mechanistic interpretability of routing systems
by providing transparent, feature-level explanations for routing
decisions, enabling users to understand why specific models are
chosen for a sample problem of mathematical operations.
Cite this work:
@misc {
title={
@misc {
},
author={
Dhruv Yadav
},
date={
6/1/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}