← heapsort
RESEARCH27

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

arXiv CS.AIΒ·May 18, 2026

ICRL proposes a novel framework to train large language model agents to internalize self-critique, converting feedback into unassisted problem-solving. It jointly trains a solver and a critic from a shared backbone, rewarding the critic for actionable feedback to foster iterative self-improvement.

Read original β†—