Jun 1, 2025

Judge using SAE Features

Dhruv Yadav

Details

Details

Arrow
Arrow
Arrow
Arrow
Arrow

The key idea of this project was to explore model judgement using

Sparse Autoencoder (SAE) features for mathematical reasoning

tasks involving addition, multiplication, and subtraction operations.

We compared this performance against basic Chain-of-Thought

(CoT) based judgement using routing algorithms deployed by the

Martian SDK.

Our work addresses the Judge Model Development track by

developing a SAE feature-based routing system that achieves

comparable performance to traditional CoT approaches while

providing enhanced interpretability. We demonstrate that SAE

feature analysis can effectively guide model selection for

mathematical reasoning tasks, with our system showing 91%

routing accuracy compared to 98% for traditional CoT methods.

Critically, our SAE-based approach identified 7% of cases where

both models were inadequate based on feature analysis - insights

invisible to traditional routing methods.

This work advances mechanistic interpretability of routing systems

by providing transparent, feature-level explanations for routing

decisions, enabling users to understand why specific models are

chosen for a sample problem of mathematical operations.

Cite this work:

@misc {

title={

@misc {

},

author={

Dhruv Yadav

},

date={

6/1/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow
Arrow
Arrow
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.