RESEARCH60
UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs
arXiv CS.CLΒ·June 8, 2026
UnpredictaBench is introduced as a new benchmark to evaluate large language models' ability to capture true underlying distributions, addressing their tendency to collapse towards single answers. It provides 448 problems and a KS@N metric to test sampling outcomes from various target distributions.
Read original β