← heapsort
RESEARCH29

MedicalBench: Evaluating Large Language Models Toward Improved Medical Concept Extraction

arXiv CS.CLΒ·May 21, 2026

This paper introduces MedicalBench, a new benchmark for evaluating Large Language Models in medical concept extraction from electronic health records. It focuses on implicit medical reasoning and evidence grounding, addressing the challenge of identifying concepts not explicitly stated.

Read original β†—