RESEARCH27
RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management
arXiv CS.AIΒ·April 16, 2026
RiskWebWorld is presented as the first highly realistic interactive benchmark for evaluating GUI agents in e-commerce risk management, addressing their underexplored effectiveness in high-stakes investigative domains. It features 1,513 tasks from production risk-control pipelines and a Gymnasium-compliant infrastructure for scalable evaluation, revealing a dramatic capability gap across diverse models.
Read original β