Sep 15, 2025
ThoughtTrim
Andrew Briand, Josh Rauvola
Large language models often generate long chain-of-thought (CoT) traces in which only a small subset of sentences materially influences the final answer. We propose ThoughtTrim: a simple evaluation framework that ranks CoT chunks by counterfactual-importance KL [1], reconstructs prompts using only the top-ranked chunks, and measures the accuracy-retention trade-off as filtering thresholds rise. Using Qwen2.5-1.5B on a 100-question Biology subset of MMLU-Pro, we find that (i) for some questions, KL-guided trimming preserves accuracy at substantial token savings (60-90% on many items), (ii) “first failure” thresholds are heterogeneous - some problems fail immediately while a long tail remains robust up to aggressive pruning, and (iii) a KL-shuffled control that preserves the number of kept chunks but breaks informativeness is consistently worse than the original selection, demonstrating the value of the ranking signal. We release a lightweight pipeline that utilizes the counterfactual-importance KL to understand the thresholds, efficiency frontiers, and failure distributions. This opens up future work in creating systems that are more deterministic, efficient, and robust through fine-tuning approaches leading to safer more deterministic approaches in agentic and LLM-based systems.
This is an extremely impressive amount of empirical and theoretical work for a weekend sprint. The experiments are extensive and reproducible, span many models, and are well detailed in your equations.
I would have liked to see a table or plot summarizing your results, it's a bit hard to parse the headline result from your text. The biggest weakness of the piece is tying it to CBRN risks. It's not clear at all how this will eventually help address any CBRN threat models and right now seems like a way to make reasoning models more token efficient.
Cite this work
@misc {
title={
(HckPrj) ThoughtTrim
},
author={
Andrew Briand, Josh Rauvola
},
date={
9/15/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


