RESEARCH31

Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents

arXiv CS.LG·April 21, 2026

This research introduces a rubric-based Generative Reward Model (GRM) to enhance Reinforced Fine-Tuning (RFT) for LLM Agents in Software Engineering (SWE) tasks. By providing richer learning signals beyond binary terminal rewards, this approach shapes intermediate behaviors and significantly improves the quality of the resolution process.

reinforcement learning Fine-tuning Software Engineering AI agents LLM

Read original ↗