UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs
UnpredictaBench is introduced as a new benchmark to evaluate large language models' ability to capture true underlying distributions, addressing their tendency to collapse towards single answers. It provides 448 problems and a KS@N metric to test sampling outcomes from various target distributions.