← heapsort
RESEARCH29

Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation

arXiv CS.LGΒ·April 16, 2026

This paper presents a necessary condition for intra-group learning algorithm design in Reinforcement Learning, requiring objectives to maintain gradient exchangeability across token updates to prevent reward-irrelevant drift. It proposes minimal transformations to restore this cancellation structure, which stabilizes training and improves sample efficiency.

Read original β†—