RESEARCH27

When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering

arXiv CS.CL·May 15, 2026

This research evaluates large language models (LLMs) in biomedical question answering, specifically addressing their reliability when faced with conflicting or incomplete evidence. It reveals that LLM accuracy significantly drops, and predictions flip, when the order of correct and contradictory documents is reversed, highlighting issues with order effects and the need for conflict-aware abstention.

LLMs evaluation Reliability Biomedical AI uncertainty

Read original ↗