RESEARCH28
EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs
arXiv CS.AIΒ·June 1, 2026
The paper introduces EHRBench, an automated and reliable EHR-grounded benchmark for evaluating LLM-based clinical decision-making, addressing the insufficient understanding of LLMs' reliability in real-world clinical tasks. Its goal is to ensure both scale and quality in the evaluation of Clinical Decision Making (CDM) models.
Read original β