CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards
This paper proposes CSRP, a three-stage framework for Chinese Grammatical Error Correction (CGEC) using Large Language Models (LLMs). CSRP addresses challenges of general-purpose models and metric optimization with continual pre-training, Chain-of-Thought SFT, and policy optimization with efficiency-aware rewards that penalize unnecessary edits, achieving state-of-the-art performance on the NACGEC benchmark.