medical AI

34 items

RESEARCHarXiv CS.AI·20h ago

PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow

PathoSage is a three-stage framework addressing evidence adjudication in pathology, explicitly separating knowledge retrieval, evidence collection, and evidence adjudication. It employs an agentic system with Structured Evidence Deliberation to independently evaluate heterogeneous evidence and reduce anchoring bias.

agent workflows MLLMs pathology medical AI

RESEARCHDEV.to AI·4/18/2026

ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using LargeLanguage Models

ChatCAD is an interactive computer-aided diagnosis system that leverages Large Language Models to analyze medical images. It aims to enhance the accuracy and efficiency of medical diagnosis through artificial intelligence.

computer-aided diagnosis Healthcare large language models Medical Imaging

RESEARCHarXiv CS.CL·19d ago

MedicalBench: Evaluating Large Language Models Toward Improved Medical Concept Extraction

This paper introduces MedicalBench, a new benchmark for evaluating Large Language Models in medical concept extraction from electronic health records. It focuses on implicit medical reasoning and evidence grounding, addressing the challenge of identifying concepts not explicitly stated.

LLMs concept extraction Healthcare Benchmarking

RESEARCHarXiv CS.LG·17d ago

HealthCraft: A Reinforcement Learning Safety Environment for Emergency Medicine

The paper introduces HealthCraft, a public reinforcement-learning environment designed to evaluate the safety of frontier language models in emergency medicine. It focuses on trajectory-level safety, tool misuse, and clinical pressure, built on a FHIR R4 world state and offering 195 tasks for comprehensive assessment.

LLMs evaluation reinforcement learning medical AI

RESEARCHarXiv CS.LG·27d ago

Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation

This paper introduces the Convolutional Variational Deep Embedding (Conv-VaDE) model for EEG microstate analysis. It enhances interpretability by jointly learning topographic reconstruction and probabilistic soft clustering, enabling generative decoding of cluster prototypes into verifiable scalp topographies.

deep learning machine learning Neuroscience medical AI

RESEARCHarXiv CS.AI·14d ago

When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

This research introduces Med-Stress, a framework to test the epistemic resilience of LLMs in clinical dialogue, revealing that high diagnostic accuracy doesn't guarantee belief stability under escalating pressure. It proposes RBED and R-FT as novel defenses to mitigate this failure mode in medical AI.

LLMs epistemic resilience medical AI AI safety

ARTICLEMIT Tech Review AI·5/4/2026

Tailoring AI solutions for health care needs

The AI market promises significant transformation, with healthcare being a prime target due to financial pressures, labor shortages, and the increasing burden of caring for an aging population. AI developers are targeting a wide range of functions, from curing cancer to streamlining processes.

AI applications future-of-work Healthcare medical AI

ARTICLEDEV.to AI·25d ago

Why AI for Doctors Is Becoming Essential in Modern Medicine

94% of healthcare executives view AI as critical for the future of medicine. Artificial intelligence assists doctors by scanning radiology images and identifying skin cancers or cancerous cells with speed and accuracy. It serves as a powerful second opinion, combining its speed with human judgment and patient context.

AI integration Healthcare diagnostics medical AI

RESEARCHarXiv CS.LG·5/7/2026

Investigating Trustworthiness of Nonparametric Deep Survival Models for Alzheimer's Disease Progression Analysis

This research investigates the trustworthiness and fairness of nonparametric deep survival models for analyzing Alzheimer's Disease (AD) progression. It addresses the lack of studies considering learned bias in existing deep learning models for AD and proposes novel fairness metrics to ensure reliable predictions.

deep learning Alzheimer's disease survival analysis medical AI

RESEARCHarXiv CS.CL·27d ago

ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV

This paper introduces ClinicalBench, a 400-question benchmark designed to stress-test assertion-aware retrieval for cross-admission clinical QA on MIMIC-IV using real EHR notes. It also presents EpiKG, a patient knowledge graph system that incorporates assertion and temporality tags to route retrieval by question intent, demonstrating significant performance improvements across various LLMs.

LLMs Benchmarking clinical QA medical AI

RESEARCHarXiv CS.LG·4/15/2026

DBGL: Decay-aware Bipartite Graph Learning for Irregular Medical Time Series Classification

DBGL introduces a novel Decay-Aware Bipartite Graph Learning method to address the challenges of irregular medical time series classification. It utilizes a patient-variable bipartite graph to model irregular sampling patterns and variable relationships, alongside a node-specific temporal decay encoding for variable decay irregularity.

Graph Neural Networks machine learning medical AI irregular data

ARTICLEDEV.to AI·4/17/2026

We Built a Medical AI With 383 Specialist Agents. Here's What Actually Works (and What Doesn't)

The article shares insights from 18 months of building Helios Med, a medical AI with 383 specialist agents designed to assist with diagnostic reasoning. It aims to provide a thorough second opinion for doctors and patients, addressing the limitations of current healthcare practices and diagnostic errors.

Healthcare multi-agent systems medical AI diagnostic-aids

RESEARCHarXiv CS.CL·18d ago

When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering

This paper introduces OGCaReBench, a new retrieval-focused benchmark aimed at evaluating LLMs' ability to answer clinical questions that go beyond typical medical guidelines. It addresses the gap where most medical LLMs are trained on common, guideline-focused knowledge, while real-world care often involves rare cases not covered by guidelines.

LLMs Benchmarking case reports medical AI

RESEARCHarXiv CS.AI·6d ago

Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection

Traj-Evolve is a self-evolving multi-agent system designed to model patient trajectories from electronic health records for early lung cancer detection. It utilizes an Experience Pool for retrieving similar cases and multi-agent reinforcement learning to optimize inter-agent collaboration.

Healthcare machine learning AI multi-agent systems

RESEARCHarXiv CS.AI·4/17/2026

Seeing Through Experts Eyes A Foundational Vision Language Model Trained on Radiologists Gaze and Reasoning

GazeX is a new vision language model trained on radiologists' eye-tracking data and reasoning to improve chest X-ray interpretation. The model learns to emulate expert spatial and temporal attention, aiming to bridge the gap between model outputs and clinical diagnostic reasoning.

Vision-Language Models computer vision medical AI diagnostic tools

RESEARCHarXiv CS.CL·4/10/2026

EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents

O estudo apresenta o EMSDialog, um novo conjunto de dados de 4.414 conversas sintéticas multi-falantes para serviços médicos de emergência, geradas a partir de relatórios reais de pacientes usando uma pipeline de agentes multi-LLM. Este dataset, anotado com diagnósticos e tópicos, demonstra melhorias na precisão e estabilidade da previsão de diagnóstico conversacional.

synthetic dialogue generation Healthcare multi-LLM agents medical AI

RESEARCHarXiv CS.LG·5/1/2026

People-Centred Medical Image Analysis

Despite accurate diagnostic systems from data-centric medical AI, widespread clinical adoption is limited due to insufficient attention to fair performance across diverse patient populations and poor workflow integration. This paper proposes a 'People-Centred Medical Image Analysis' approach to address these interconnected challenges, which prior work has typically examined in isolation.

human-AI interaction AI ethics medical AI

RESEARCHarXiv CS.CL·20d ago

Prompting language influences diagnostic reasoning and accuracy of large language models

This research evaluated the impact of prompting language on the diagnostic reasoning and accuracy of large language models (LLMs) in clinical settings. Four out of five models performed better in English, highlighting the uncertainty regarding LLM reliability across different languages.

Multilingual AI LLMs clinical decision support Diagnostic Accuracy

ARTICLEDEV.to AI·22d ago

Medical AI Doesn’t Just Need Bigger Models. It Needs an ImageNet for State Transitions

This article proposes the creation of "Biomedical TransitionNet", a new type of dataset analogous to ImageNet, but focused on biological state transitions for the next generation of medical AI. It argues for the necessity of such infrastructure to build real-world models in biomedicine, moving beyond classification and prediction.

Biomedical TransitionNet datasets AI infrastructure healthcare AI

DOCHugging Face Blog·5/8/2026

MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required

This content details the fine-tuning of a clinical AI model, MedQA, on the AMD ROCm platform. It highlights the ability to perform this task without requiring CUDA, offering a significant alternative for AI development.

GPU hardware-compatibility Fine-tuning medical AI