RESEARCHarXiv CS.AI·15d ago
How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning
This paper quantifies and explains redundancy in large language model (LLM) reasoning, formalizing the concept and measuring it at scale. The research reveals that between 61% and 93% of LLM thought steps are unnecessary, impacting latency, GPU time, and energy consumption.
27