Jun 2, 2025

Routing LLMs using Distilled Predictors and Confidence Thresholding

Gideon Daniel Giftson T

Details

Details

Arrow
Arrow
Arrow
Arrow
Arrow

This project explores confidence-based routing using sparsified transformer models as an intelligent alternative to monolithic AI systems. Focusing on Track 2: Intelligent Router Systems, we implemented a confidence-threshold router for a pruned DistilBERT model deployed via DeepSparse on the SST-2 sentiment classification task. We investigated how routing confidence correlates with prediction accuracy and how routing fewer, more confident samples can enable fallback to larger models while retaining accuracy.

Our system enables cost-efficient, interpretable decision-making with routing justified by softmax confidence thresholds. We show that routing 70% of samples at a confidence threshold of 0.8 retains 97% of original accuracy while reducing inference costs by over 50%. These results advance the Expert Orchestration Architecture by demonstrating real-world savings and interpretable routing without compromising safety or performance.

Cite this work:

@misc {

title={

@misc {

},

author={

Gideon Daniel Giftson T

},

date={

6/2/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.