RESEARCHarXiv CS.LG·21d ago
ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning
ReCrit is a new reinforcement learning framework designed to improve large language models' performance in scientific critic interaction. It addresses the issue of LLMs abandoning correct solutions after user criticism by focusing on inter-turn correctness transitions and categorizing behaviors like correction, sycophancy, and robustness.
29