Beyond Pixels: Introspective and Interactive Grounding for Visualization Agents
Vision-Language Models (VLMs) often misinterpret interactive charts due to a "Pixel-Only Bottleneck," treating them as static images. This paper introduces Introspective and Interactive Visual Grounding (IVG), a framework combining spec-grounded introspection and view-grounded interaction to resolve visual ambiguities, significantly improving QA accuracy.
