RESEARCH27
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria
arXiv CS.AIΒ·May 12, 2026
This research paper introduces Auto-Rubric as Reward (ARR), a novel framework for aligning multimodal generative models with human preferences. ARR externalizes a VLM's implicit preference knowledge into explicit, prompt-specific rubrics, decomposing human judgment into independently verifiable quality dimensions to overcome limitations of traditional RLHF approaches.
Read original β