RESEARCHarXiv CS.CL·26d ago
Correct Answers from Sound Reasoning: Verifiable Process Supervision for Language Models
This paper proposes Verifiable Process Supervision (VPS), a post-training framework to jointly optimize language model prediction accuracy and reasoning quality. VPS uses supervised fine-tuning to induce a structured reasoning format, evaluating intermediate claims against ground-truth signals with adaptive reward weighting.
27