RESEARCH29
Two-dimensional early exit optimisation of LLM inference
arXiv CS.CLΒ·April 22, 2026
This paper introduces a two-dimensional early exit strategy for LLM classification tasks, coordinating layer-wise and sentence-wise exiting. The method achieves multiplicative computational savings and speed-ups of 1.4-2.3x over optimal layer-wise early exit for simpler tasks, applicable across various state-of-the-art LLMs.
Read original β