RESEARCH27

Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction

arXiv CS.LG·May 15, 2026

Collider-Bench is a new benchmark designed to evaluate LLM agents' ability to reproduce experimental analyses from the LHC using public data and software. Agents must apply physical reasoning and domain knowledge to overcome missing implementation details and generate predicted collision event yields.

particle physics benchmarking scientific reproduction AI agents LLM

Read original ↗