RESEARCHarXiv CS.LG·21d ago
UCCI: Calibrated Uncertainty for Cost-Optimal LLM Cascade Routing
UCCI is an innovative router that uses calibrated uncertainty to optimize the cost of LLM cascades, sending easy queries to smaller models and difficult ones to larger models. It reduces inference cost by 31% on production workloads while maintaining accuracy, by calibrating model confidence.
27