medical AI

34 items

ARTICLEDEV.to AI·4/13/2026

The Shocking Truth About AI Agent Benchmarks: Your Medical Diagnostics Will Never Be the Same in 2026

The article reveals the critical importance of rigorous, standardized AI agent benchmarks in medical diagnostics by 2026, questioning the readiness of AI for widespread clinical adoption. It emphasizes that without proper performance validation, the revolutionary potential of AI in healthcare remains largely theoretical and untrustworthy.

AI Benchmarks Diagnostic AI AI validation healthcare AI

RESEARCHarXiv CS.LG·5/5/2026

GAZE: Grounded Agentic Zero-shot Evaluation with Viewer-Level Tools and Literature Retrieval on Rare Brain MRI

GAZE is a framework enabling medical Vision-Language Models (VLMs) to iteratively analyze brain MRI images using viewer-level tools and literature retrieval. It achieved 58.2 mAP for lesion localization and 34.9% Top-1 diagnostic accuracy on the NOVA benchmark for rare neurological conditions.

Vision-Language Models neurology Benchmarking medical AI

RESEARCHarXiv CS.CL·5/5/2026

CLEAR: Revealing How Noise and Ambiguity Degrade Reliability in LLMs for Medicine

The CLEAR framework is introduced to assess how ambiguity and uncertainty impact medical Large Language Models' (LLMs) reliability, moving beyond simplified evaluation benchmarks. It systematically perturbs answer options and their semantic framing, revealing that increased plausible answers degrade LLM performance and caution decreases with uncertain abstention phrasing.

Ambiguity LLMs evaluation Reliability

RESEARCHarXiv CS.CL·4/16/2026

A Proactive EMR Assistant for Doctor-Patient Dialogue: Streaming ASR, Belief Stabilization, and Preliminary Controlled Evaluation

This paper introduces a proactive EMR assistant for doctor-patient dialogue, designed to overcome limitations of passive systems by integrating streaming ASR, belief stabilization, and action planning. The system was evaluated in a preliminary controlled setting, achieving an F1 of 0.84 and Recall@5 of 0.87.

Natural Language Processing ASR healthcare AI medical AI

RESEARCHarXiv CS.CL·4/24/2026

Weighting What Matters: Boosting Sample Efficiency in Medical Report Generation via Token Reweighting

This work introduces a token reweighting loss function to enhance data efficiency in training vision-language models for medical report generation. By prioritizing semantically salient tokens, the method achieves comparable report quality using up to ten times less training data.

Data efficiency machine learning computer vision natural language generation

RESEARCHarXiv CS.LG·4/21/2026

A Discordance-Aware Multimodal Framework with Multi-Agent Clinical Reasoning

This research proposes a discordance-aware multimodal framework for knee osteoarthritis, integrating machine learning prediction models with a multi-agent reasoning system. It leverages various data modalities, including tabular features, MRI, and X-ray embeddings, to predict joint space loss and pain progression.

multimodal AI machine learning multi-agent systems medical AI

RESEARCHarXiv CS.LG·4/24/2026

Clinically Interpretable Sepsis Early Warning via LLM-Guided Simulation of Temporal Physiological Dynamics

This paper proposes an LLM-guided temporal simulation framework for clinically interpretable early sepsis warning. The model simulates physiological trajectories prior to disease onset by integrating spatiotemporal feature extraction, medical reasoning cues, and agent-based post-processing for physiologically plausible predictions.

Healthcare early warning systems simulation medical AI

RESEARCHarXiv CS.AI·6d ago

Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection

Traj-Evolve is a self-evolving multi-agent system designed to model patient trajectories from electronic health records for early lung cancer detection. It utilizes an Experience Pool for retrieving similar cases and multi-agent reinforcement learning to optimize inter-agent collaboration.

Healthcare machine learning AI multi-agent systems

RESEARCHGoogle DeepMind Blog·4/30/2026

Enabling a new model for healthcare with AI co-clinician

This content explores research into the future of healthcare with AI, focusing on the development of an AI co-clinician. The goal is to create an AI-augmented model for care.

AI-assisted Healthcare medical AI

Enabling a new model for healthcare with AI co-clinician

RESEARCHarXiv CS.LG·4/8/2026

PRIME: Prototype-Driven Multimodal Pretraining for Cancer Prognosis with Missing Modalities

PRIME é um novo framework de pré-treinamento multimodal auto-supervisionado projetado para prognóstico de câncer, que aborda o desafio de modalidades de dados ausentes em coortes clínicas. Ele integra imagens de histopatologia, expressão gênica e relatórios patológicos, aprendendo representações robustas por meio de imputação semântica no espaço latente e objetivos de alinhamento intermodal.

self-supervised learning Multimodal Pretraining Missing Modalities Cancer Prognosis

RESEARCHarXiv CS.LG·4/30/2026

A Multimodal and Explainable Machine Learning Approach to Diagnosing Multi-Class Ejection Fraction from Electrocardiograms

This research developed a multimodal machine-learning framework combining ECG features and EHR data to diagnose multi-class left ventricular ejection fraction. The model achieved high AUROCs and used SHAP for explainability, outperforming baseline models.

machine learning Explainable AI medical AI

RESEARCHarXiv CS.AI·5/6/2026

ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations

ClinicBot is an AI system designed to provide trustworthy clinical support by translating official guideline recommendations. It tackles the issue of LLM hallucinations in high-stakes medical contexts through structured guideline extraction and evidence prioritization.

Healthcare RAG Chatbot AI

RESEARCHarXiv CS.LG·5/6/2026

PRISM-CTG: A Foundation Model for Cardiotocography Analysis with Multi-View SSL

PRISM-CTG is a clinically grounded self-supervised foundation model for Cardiotocography (CTG) analysis, designed to overcome limitations of narrowly curated labeled datasets. It leverages a multi-view self-supervised framework to learn transferable domain-level representations from large-scale unlabelled recordings.

self-supervision learning CTG analysis Foundation Models

RESEARCHarXiv CS.AI·4/8/2026

MedGemma 1.5 Technical Report

O MedGemma 1.5 4B é um novo modelo que expande as capacidades do MedGemma 1, integrando análise de imagens médicas de alta dimensão (CT/MRI, histopatologia), localização anatômica e compreensão de documentos médicos. Ele demonstra ganhos significativos em precisão de classificação de condições em MRI e CT, e um aumento de 47% no macro F1 para imagens de patologia de lâmina inteira.

deep learning AI healthcare AI Medical Imaging