RESEARCH28
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?
arXiv CS.CLΒ·May 19, 2026
This paper introduces CHI-Bench, a new benchmark designed to test AI agents' ability to automate complex, policy-rich, and long-horizon healthcare workflows. It addresses critical gaps in current benchmarks by focusing on policy density, multi-role composition, and multilateral interaction in realistic healthcare operations across multiple domains.
Read original β