← heapsort
RESEARCH30

GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

arXiv CS.LGΒ·May 21, 2026

This paper introduces GROW, an RL framework for open-world VLM agents, addressing limitations of existing Supervised Fine-Tuning methods. It proposes a novel approach for Group Relative Policy Optimization (GRPO) by decomposing trajectories into state-action samples rather than full entities.

Read original β†—