← heapsort-ai

gradient dynamics

1 items

RESEARCHarXiv CS.LG·4/16/2026

Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation

This paper presents a necessary condition for intra-group learning algorithm design in Reinforcement Learning, requiring objectives to maintain gradient exchangeability across token updates to prevent reward-irrelevant drift. It proposes minimal transformations to restore this cancellation structure, which stabilizes training and improves sample efficiency.

29