RESEARCH27

Source-Modality Monitoring in Vision-Language Models

arXiv CS.CL·April 27, 2026

This research defines and investigates source-modality monitoring in Vision-Language Models (VLMs), examining their ability to track the origin of information. It evaluates how VLMs use syntactic and semantic signals to bind input sources, finding both are crucial but semantic signals often dominate, with implications for model robustness.

model robustness multimodal AI Vision-Language Models

Read original ↗