← heapsort-ai

Vision-Language Models

25 items

RESEARCHarXiv CS.CL·27d ago

Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models

Vision-language models (VLMs) experience significant accuracy drops and severe miscalibration when operating with text-only inputs, even with preserved semantic information. The Latent Imagination Module (LIM) is proposed to predict imagined latent embeddings from text, improving accuracy and reducing calibration error in missing-image scenarios.

27
RESEARCHarXiv CS.AI·28d ago

Do Vision-Language-Models show human-like logical problem-solving capability in point and click puzzle games?

This paper introduces VLATIM, a new benchmark designed to evaluate the human-like logical problem-solving capabilities of Vision-Language Models (VLMs) in point-and-click physics puzzle games. It reveals a significant disparity between reasoning and execution in large proprietary models when solving The Incredible Machine 2.

27