RESEARCH27

StaRPO: Stability-Augmented Reinforcement Policy Optimization

arXiv CS.AI·April 13, 2026

StaRPO is a novel reinforcement learning framework designed to improve the logical consistency and structural coherence of large language models in complex reasoning tasks. It explicitly incorporates stability metrics, such as Autocorrelation Function and Path Efficiency, to evaluate local step-to-step coherence and global goal-directedness of the reasoning process.

Policy optimization LLMs reinforcement learning Reasoning large language models

Read original ↗