Text-Only Inputs — AI articles, news & research

RESEARCHarXiv CS.CL·27d ago

Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models

Vision-language models (VLMs) experience significant accuracy drops and severe miscalibration when operating with text-only inputs, even with preserved semantic information. The Latent Imagination Module (LIM) is proposed to predict imagined latent embeddings from text, improving accuracy and reducing calibration error in missing-image scenarios.

Miscalibration Vision-Language Models Latent Imagination Text-Only Inputs