RESEARCH28
SentinelBench: A Benchmark for Long-Running Monitoring Agents
arXiv CS.AIΒ·June 5, 2026
SentinelBench is a new open-source benchmark for long-running AI agent monitoring tasks. It aims to measure progress on tasks requiring sustained attention rather than continuous action, across 100 tasks in 10 synthetic web environments.
Read original β