RESEARCH↑ trendingReddit r/MachineLearning·4/14/2026
ClawBench: Can AI Agents Complete Everyday Online Tasks? 153 tasks, 144 live websites, best model at 33.3% [R]
ClawBench is a new benchmark evaluating AI browser agents on 153 everyday tasks across 144 live websites. Key findings reveal the best model (Claude Sonnet 4.6) achieves only a 33.3% success rate, indicating a significant gap in current AI capabilities for online task completion.
42