← heapsort
RESEARCH27

A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering

arXiv CS.CLΒ·May 12, 2026

This research introduces Sem-ECE, a novel semantic-sampling framework designed to evaluate calibration in open-ended question answering for large language models. It addresses limitations of existing evaluation methods by grouping sampled answers into semantic classes, crucial for reliable LLM deployment.

Read original β†—