Keep Apart Research Going: Donate Today
Jun 2, 2025
Routing LLMs using Distilled Predictors and Confidence Thresholding
Gideon Daniel Giftson T
Summary
This project explores confidence-based routing using sparsified transformer models as an intelligent alternative to monolithic AI systems. Focusing on Track 2: Intelligent Router Systems, we implemented a confidence-threshold router for a pruned DistilBERT model deployed via DeepSparse on the SST-2 sentiment classification task. We investigated how routing confidence correlates with prediction accuracy and how routing fewer, more confident samples can enable fallback to larger models while retaining accuracy.
Our system enables cost-efficient, interpretable decision-making with routing justified by softmax confidence thresholds. We show that routing 70% of samples at a confidence threshold of 0.8 retains 97% of original accuracy while reducing inference costs by over 50%. These results advance the Expert Orchestration Architecture by demonstrating real-world savings and interpretable routing without compromising safety or performance.
Cite this work:
@misc {
title={
},
author={
Gideon Daniel Giftson T
},
date={
6/2/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}