RESEARCH28

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

arXiv CS.CL·May 19, 2026

This paper introduces CHI-Bench, a new benchmark designed to test AI agents' ability to automate complex, policy-rich, and long-horizon healthcare workflows. It addresses critical gaps in current benchmarks by focusing on policy density, multi-role composition, and multilateral interaction in realistic healthcare operations across multiple domains.

Workflows Healthcare Benchmarks automation AI agents

Read original ↗