RESEARCH30
Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO
arXiv CS.CLΒ·June 5, 2026
This research investigates optimizing Large Language Models (LLMs) for heart-focused medical question answering using Group Relative Policy Optimization (GRPO) for post-training. A Variance-Aware Reward Framework is proposed to enhance rubric-based supervision with continuous analytical reward functions.
Read original β