RESEARCH27
StaRPO: Stability-Augmented Reinforcement Policy Optimization
arXiv CS.AIΒ·April 13, 2026
StaRPO is a novel reinforcement learning framework designed to improve the logical consistency and structural coherence of large language models in complex reasoning tasks. It explicitly incorporates stability metrics, such as Autocorrelation Function and Path Efficiency, to evaluate local step-to-step coherence and global goal-directedness of the reasoning process.
Read original β