RESEARCHarXiv CS.LG·26d ago
Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction
Collider-Bench is a new benchmark designed to evaluate LLM agents' ability to reproduce experimental analyses from the LHC using public data and software. Agents must apply physical reasoning and domain knowledge to overcome missing implementation details and generate predicted collision event yields.
27