← heapsort-ai

Critic Interaction

1 items

RESEARCHarXiv CS.LG·21d ago

ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning

ReCrit is a new reinforcement learning framework designed to improve large language models' performance in scientific critic interaction. It addresses the issue of LLMs abandoning correct solutions after user criticism by focusing on inter-turn correctness transitions and categorizing behaviors like correction, sycophancy, and robustness.

29