RESEARCH27
UCCI: Calibrated Uncertainty for Cost-Optimal LLM Cascade Routing
arXiv CS.LGΒ·May 20, 2026
UCCI is an innovative router that uses calibrated uncertainty to optimize the cost of LLM cascades, sending easy queries to smaller models and difficult ones to larger models. It reduces inference cost by 31% on production workloads while maintaining accuracy, by calibrating model confidence.
Read original β