RESEARCH27

Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits

arXiv CS.AI·May 12, 2026

This research tests the "Attention-Confidence Assumption" in Vision-Language Models (VLMs), finding that attention structure is a near-zero predictor of correctness. The study uses a unified mechanistic pipeline (VLM Reliability Probe) to analyze attention, generation dynamics, and hidden-state geometry in three VLM families.

Vision-Language Models Mechanistic Interpretability Attention Mechanisms AI reliability

Read original ↗