RESEARCH29

Two-dimensional early exit optimisation of LLM inference

arXiv CS.CL·April 22, 2026

This paper introduces a two-dimensional early exit strategy for LLM classification tasks, coordinating layer-wise and sentence-wise exiting. The method achieves multiplicative computational savings and speed-ups of 1.4-2.3x over optimal layer-wise early exit for simpler tasks, applicable across various state-of-the-art LLMs.

LLMs Computational Efficiency Inference Optimization

Read original ↗