← heapsort
RESEARCH28

EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs

arXiv CS.AIΒ·June 1, 2026

The paper introduces EHRBench, an automated and reliable EHR-grounded benchmark for evaluating LLM-based clinical decision-making, addressing the insufficient understanding of LLMs' reliability in real-world clinical tasks. Its goal is to ensure both scale and quality in the evaluation of Clinical Decision Making (CDM) models.

Read original β†—