RESEARCHarXiv CS.CL·29d ago
A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering
This research introduces Sem-ECE, a novel semantic-sampling framework designed to evaluate calibration in open-ended question answering for large language models. It addresses limitations of existing evaluation methods by grouping sampled answers into semantic classes, crucial for reliable LLM deployment.
27