RESEARCH27
From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models
arXiv CS.LGΒ·June 2, 2026
Researchers propose Demo2Reward, a test-time adaptation technique to optimize Vision-Language Model (VLM) reward models in robotics. It uses a few demonstrations to reduce false positives while preserving true positives, without requiring additional model training.
Read original β