RESEARCH31
Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents
arXiv CS.LGΒ·April 21, 2026
This research introduces a rubric-based Generative Reward Model (GRM) to enhance Reinforced Fine-Tuning (RFT) for LLM Agents in Software Engineering (SWE) tasks. By providing richer learning signals beyond binary terminal rewards, this approach shapes intermediate behaviors and significantly improves the quality of the resolution process.
Read original β