MLLMs

7 items

RESEARCHarXiv CS.AI·20h ago

PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow

PathoSage is a three-stage framework addressing evidence adjudication in pathology, explicitly separating knowledge retrieval, evidence collection, and evidence adjudication. It employs an agentic system with Structured Evidence Deliberation to independently evaluate heterogeneous evidence and reduce anchoring bias.

agent workflows MLLMs pathology medical AI

RESEARCHDEV.to AI·1d ago

WorldBench: Top MLLM Scores 64% on Visually Diverse Benchmark

WorldBench, a new multimodal benchmark from MIT researchers, evaluates 15 MLLMs on visually diverse images, revealing fundamental gaps in visual understanding with the top model scoring only 64.0% accuracy. The benchmark prioritizes visual diversity over various task types to expose these shortcomings.

multimodal AI research AI Benchmarks MLLMs

RESEARCHarXiv CS.AI·4/16/2026

Towards Scalable Lightweight GUI Agents via Multi-role Orchestration

This paper proposes the LAMO framework to address the challenge of deploying lightweight MLLM-powered autonomous GUI agents on resource-constrained devices. LAMO enhances lightweight MLLMs with GUI-specific knowledge and task scalability through multi-role orchestration.

AI frameworks MLLMs resource optimization multi-agent systems

RESEARCHDEV.to AI·4/18/2026

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs HallucinationEvaluation

AMBER introduces a new LLM-free, multi-dimensional benchmark designed to rigorously evaluate hallucination in Multimodal Large Language Models (MLLMs). This research aims to provide a comprehensive tool for assessing the reliability and accuracy of MLLM outputs.

hallucination MLLMs Benchmarking AI evaluation

RESEARCHarXiv CS.CL·5/8/2026

The Cost of Context: Mitigating Textual Bias in Multimodal Retrieval-Augmented Generation

This paper identifies and formalizes

AI models research RAG MLLMs

RESEARCHarXiv CS.LG·4/21/2026

SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics

SaFeR-Steer is a novel framework designed to improve the safety alignment of Multi-modal Large Language Models (MLLMs) in multi-turn dialogues, addressing challenges like escalating unsafe intent and long-context safety decay. It employs synthetic bootstrapping and feedback dynamics, while also releasing the STEER dataset for training and evaluation.

Safety security MLLMs multi-turn

RESEARCHarXiv CS.CL·12d ago

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

The paper proposes ICG, a novel framework for personalized cover image generation that integrates MLLM-based prompting with preference alignment. It utilizes semantic features and user embeddings to contextualize the diffusion model and adopts a multi-reward learning strategy to address the lack of labeled supervision.

personalization Diffusion Models MLLMs image generation