Keep Apart Research Going: Donate Today
Jun 2, 2025
Mechanistic Judging and LLM Routing: Evaluation, Task-Specific Vulnerabilities, and Exploitable Failure Mode
Subramanyam Sahoo
Summary
This research presents the development of a mechanistically interpretable LLM-based judging system with a novel routing algorithm. The implementation exclusively leverages the Qwen reasoning model architecture, establishing a foundation for transparent evaluation processes. Furthermore, we introduce innovative evaluation methodologies specifically designed to detect and analyze reward hacking vulnerabilities within task decomposition frameworks. This approach enables more robust assessment of LLM performance while providing insights into exploitable failure modes that emerge during complex reasoning tasks.
Cite this work:
@misc {
title={
},
author={
Subramanyam Sahoo
},
date={
6/2/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}