ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction
ReVision introduces a method to scale computer-use agents by reducing temporal visual redundancy in interaction trajectories. It employs a learned patch selector to remove redundant visual tokens, cutting token usage by approximately 46% and improving efficiency for multimodal language models across benchmarks.