← heapsort
RESEARCH29

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

arXiv CS.LGΒ·June 1, 2026

This paper introduces NumLeak, a framework designed to measure memorized recall in foundation models using public numeric benchmarks. It reveals that top-tier LLMs recall financial and economic data with high fidelity, suggesting that evaluations may be measuring memorization rather than genuine out-of-sample skill.

Read original β†—