RESEARCH27
ICRL: Learning to Internalize Self-Critique with Reinforcement Learning
arXiv CS.AIΒ·May 18, 2026
ICRL proposes a novel framework to train large language model agents to internalize self-critique, converting feedback into unassisted problem-solving. It jointly trains a solver and a critic from a shared backbone, rewarding the critic for actionable feedback to foster iterative self-improvement.
Read original β