← heapsort
RESEARCH27

Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning

arXiv CS.CLΒ·May 15, 2026

This paper audits multimodal-physics evaluation pipelines, uncovering construction practices that distort how vision-language reasoning is measured. It addresses train-eval contamination, translation drift, and MCQ saturation, releasing new artifacts to tackle these gaps.

Read original β†—