VLMs

4 items

RESEARCH↑ trendingReddit r/MachineLearning·19d ago

Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]

This discussion questions whether production Vision-Language Models (VLMs) still rely on fixed-patch Vision Transformers (ViTs) for their vision capabilities, despite the existence of more efficient tokenization methods. It explores potential reasons for this, such as marginal gains, pipeline limitations, or unclear scaling laws for adaptive patching.

VLMs deep learning Vision Transformers Tokenization

RESEARCHDEV.to AI·20d ago

PaliGemma 2: A Family of Versatile VLMs for Transfer

PaliGemma 2 is introduced as a new family of versatile Vision-Language Models (VLMs) specifically designed to excel in various transfer learning applications. This advancement aims to improve performance across diverse multimodal tasks through effective knowledge transfer.

AI models Vision-Language Models VLMs Transfer Learning

RESEARCHarXiv CS.LG·19d ago

CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning

CP-MoE addresses catastrophic forgetting in continual learning for LLMs and VLMs using Mixture-of-Experts architectures. It introduces a transient expert and consistency-preserving routing to integrate new knowledge while preventing the overwriting of existing parameters.

LLMs VLMs learning Mixture of Experts

RESEARCHarXiv CS.AI·26d ago

Revealing Interpretable Failure Modes of VLMs

Vision-Language Models (VLMs) can exhibit catastrophic failures in real-world situations despite their broad reasoning capabilities. REVELIO is introduced as a framework to systematically uncover interpretable failure modes in VLMs by combining diversity-aware beam search and Gaussian-process Thompson Sampling to map the failure landscape.

failure modes AI models VLMs Reliability