← heapsort
RESEARCH27

Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

arXiv CS.AIΒ·May 12, 2026

This research paper introduces Auto-Rubric as Reward (ARR), a novel framework for aligning multimodal generative models with human preferences. ARR externalizes a VLM's implicit preference knowledge into explicit, prompt-specific rubrics, decomposing human judgment into independently verifiable quality dimensions to overcome limitations of traditional RLHF approaches.

Read original β†—