RESEARCH27

Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

arXiv CS.AI·May 12, 2026

This research paper introduces Auto-Rubric as Reward (ARR), a novel framework for aligning multimodal generative models with human preferences. ARR externalizes a VLM's implicit preference knowledge into explicit, prompt-specific rubrics, decomposing human judgment into independently verifiable quality dimensions to overcome limitations of traditional RLHF approaches.

multimodal models AI alignment reward learning Machine learning research Generative AI

Read original ↗