browser agents

2 items

RESEARCH↑ trendingReddit r/MachineLearning·4/14/2026

ClawBench: Can AI Agents Complete Everyday Online Tasks? 153 tasks, 144 live websites, best model at 33.3% [R]

ClawBench is a new benchmark evaluating AI browser agents on 153 everyday tasks across 144 live websites. Key findings reveal the best model (Claude Sonnet 4.6) achieves only a 33.3% success rate, indicating a significant gap in current AI capabilities for online task completion.

performance evaluation Benchmarking browser agents online tasks

ARTICLEDEV.to AI·20d ago

Why AI Browser Agents Need a Runbook Before They Need More Prompts

This article posits that AI browser agents require operational "runbooks" rather than just more refined prompts to function effectively in real browser environments. These runbooks define how agents should operate, including managing accounts, profiles, and proxies, which is critical for complex workflows.

browser agents AI automation