RESEARCH30
GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
arXiv CS.LGΒ·May 21, 2026
This paper introduces GROW, an RL framework for open-world VLM agents, addressing limitations of existing Supervised Fine-Tuning methods. It proposes a novel approach for Group Relative Policy Optimization (GRPO) by decomposing trajectories into state-action samples rather than full entities.
Read original β