RESEARCH27
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
arXiv CS.LGΒ·May 6, 2026
This survey provides an optimizer-agnostic view of rollout strategies for RL-based post-training of reasoning LLMs. It formalizes rollout pipelines with a unified notation and introduces the Generate-Filter-Control-Replay (GFCR) lifecycle taxonomy, decomposing pipelines into four modular stages.
Read original β