← heapsort
RESEARCH60

MacArena: Benchmarking Computer Use Agents on an Online macOS Environment

arXiv CS.LGΒ·June 8, 2026

MacArena is a new benchmark for computer-use agents (CUAs) operating graphical user interfaces (GUIs) on macOS, addressing the platform's underserved benchmarking landscape. It offers 421 verified tasks across 50 applications, running natively on Apple Silicon, to challenge CUAs beyond Linux-based benchmarks.

Read original β†—