Jun 2, 2025

Mechanistic Judging and LLM Routing: Evaluation, Task-Specific Vulnerabilities, and Exploitable Failure Mode

Subramanyam Sahoo

Details

Details

Arrow
Arrow
Arrow
Arrow
Arrow

This research presents the development of a mechanistically interpretable LLM-based judging system with a novel routing algorithm. The implementation exclusively leverages the Qwen reasoning model architecture, establishing a foundation for transparent evaluation processes. Furthermore, we introduce innovative evaluation methodologies specifically designed to detect and analyze reward hacking vulnerabilities within task decomposition frameworks. This approach enables more robust assessment of LLM performance while providing insights into exploitable failure modes that emerge during complex reasoning tasks.

Cite this work:

@misc {

title={

@misc {

},

author={

Subramanyam Sahoo

},

date={

6/2/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow
Arrow
Arrow
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.