RESEARCHarXiv CS.LG·19d ago
GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
This paper introduces GROW, an RL framework for open-world VLM agents, addressing limitations of existing Supervised Fine-Tuning methods. It proposes a novel approach for Group Relative Policy Optimization (GRPO) by decomposing trajectories into state-action samples rather than full entities.
30