multimodal models

2 items

RESEARCHarXiv CS.AI·5/1/2026

Step-level Optimization for Efficient Computer-use Agents

This research highlights the inefficiency of current computer-use agents, which overuse large multimodal models for every GUI interaction. It argues that tasks are heterogeneous, with routine steps needing less compute, while errors concentrate at high-risk moments like stalls or semantic drift, requiring targeted optimization.

multimodal models model efficiency GUI automation AI agents

RESEARCHarXiv CS.AI·28d ago

Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

This research paper introduces Auto-Rubric as Reward (ARR), a novel framework for aligning multimodal generative models with human preferences. ARR externalizes a VLM's implicit preference knowledge into explicit, prompt-specific rubrics, decomposing human judgment into independently verifiable quality dimensions to overcome limitations of traditional RLHF approaches.

multimodal models AI alignment reward learning Machine learning research