RESEARCH27

Confidence Calibration in Large Language Models

arXiv CS.AI·May 26, 2026

This study investigates confidence calibration in Large Language Models (LLMs) across diverse tasks, finding that current LLMs are overconfident on difficult tests and underconfident on easy ones. The researchers developed LifeEval, a new test to evaluate model calibration across varying levels of difficulty.

Confidence Calibration Overconfidence machine learning large language models AI evaluation

Read original ↗