← heapsort
RESEARCH28

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

arXiv CS.CLΒ·May 19, 2026

This paper introduces CHI-Bench, a new benchmark designed to test AI agents' ability to automate complex, policy-rich, and long-horizon healthcare workflows. It addresses critical gaps in current benchmarks by focusing on policy density, multi-role composition, and multilateral interaction in realistic healthcare operations across multiple domains.

Read original β†—