RESEARCH30

GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

arXiv CS.LG·May 21, 2026

This paper introduces GROW, an RL framework for open-world VLM agents, addressing limitations of existing Supervised Fine-Tuning methods. It proposes a novel approach for Group Relative Policy Optimization (GRPO) by decomposing trajectories into state-action samples rather than full entities.

VLM Agents Policy optimization Open-world AI reinforcement learning machine learning

Read original ↗