UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs
UnpredictaBench evaluates how well LLMs capture underlying probability distributions rather than just generating varied outputs.
As LLMs are increasingly used to simulate human behavior or economic systems, they often collapse toward a single plausible answer. UnpredictaBench tests whether models can produce samples calibrated to a target distribution, a requirement for accurate simulation that simple output diversity metrics fail to address.