RESEARCH27
Confidence Calibration in Large Language Models
arXiv CS.AIΒ·May 26, 2026
This study investigates confidence calibration in Large Language Models (LLMs) across diverse tasks, finding that current LLMs are overconfident on difficult tests and underconfident on easy ones. The researchers developed LifeEval, a new test to evaluate model calibration across varying levels of difficulty.
Read original β