RESEARCHarXiv CS.CL·26d ago
Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning
This paper audits multimodal-physics evaluation pipelines, uncovering construction practices that distort how vision-language reasoning is measured. It addresses train-eval contamination, translation drift, and MCQ saturation, releasing new artifacts to tackle these gaps.
27