Vision-Language Models

25 items

RESEARCHarXiv CS.CL·27d ago

Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models

Vision-language models (VLMs) experience significant accuracy drops and severe miscalibration when operating with text-only inputs, even with preserved semantic information. The Latent Imagination Module (LIM) is proposed to predict imagined latent embeddings from text, improving accuracy and reducing calibration error in missing-image scenarios.

Miscalibration Vision-Language Models Latent Imagination Text-Only Inputs

RESEARCHarXiv CS.AI·28d ago

Do Vision-Language-Models show human-like logical problem-solving capability in point and click puzzle games?

This paper introduces VLATIM, a new benchmark designed to evaluate the human-like logical problem-solving capabilities of Vision-Language Models (VLMs) in point-and-click physics puzzle games. It reveals a significant disparity between reasoning and execution in large proprietary models when solving The Incredible Machine 2.

puzzle games Vision-Language Models interactive AI Benchmarking

RESEARCHarXiv CS.LG·15d ago

CAFD: Concept-Aware DNN Fault Detection using VLMs

CAFD is a new learning-based method for detecting faults in Deep Neural Networks (DNNs) that combines multiple information sources for superior performance and efficiency. It utilizes model-based signals, distance features, and a novel Concept Failure Ratio (CFR) derived from Vision-Language Models (VLMs).

Fault Detection Vision-Language Models machine learning AI reliability

RESEARCHarXiv CS.AI·15d ago

In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models

This research explores AI's capacity for open-ended discovery in creative production by replicating Picbreeder with Vision-Language Models. It observes clear qualitative differences between AI-generated outputs and historical human baselines, attempting to characterize them.

Open-Ended Learning Vision-Language Models Evolutionary AI AI Research

NEWSTogether AI Blog·3/18/2026

Together AI expands fine-tuning service with tool calling, reasoning, and vision support

Together AI has expanded its fine-tuning service with native support for tool calling, reasoning, and vision-language models. The enhancements also include 100B+ model training, up to 6x higher throughput, and job cost and ETA estimates.

Vision-Language Models tool-calling Reasoning Together AI