RESEARCHarXiv CS.AI·28d ago
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria
This research paper introduces Auto-Rubric as Reward (ARR), a novel framework for aligning multimodal generative models with human preferences. ARR externalizes a VLM's implicit preference knowledge into explicit, prompt-specific rubrics, decomposing human judgment into independently verifiable quality dimensions to overcome limitations of traditional RLHF approaches.
27