Improving Multimodal Reasoning via Worst Dimension Optimization
Multimodal reasoning requires maintaining integrity across diverse constraints like visual grounding and logical consistency. Current Process Reward Models often hide individual dimension failures by equally weighing factors, compromising the overall reasoning process.