Vision Transformers

3 items

RESEARCH↑ trendingReddit r/MachineLearning·27d ago

Elastic Attention Cores for Scalable Vision Transformers [R]

This paper introduces Elastic Attention Cores as a new building block for scalable Vision Transformers, addressing the high cost of dense self-attention. The approach uses a core-periphery block-sparse attention structure and nested dropout for elastic inference cost adjustments, achieving competitive accuracy.

deep learning computer vision attention mechanisms Vision Transformers

Elastic Attention Cores for Scalable Vision Transformers [R]

RESEARCH↑ trendingReddit r/MachineLearning·19d ago

Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]

This discussion questions whether production Vision-Language Models (VLMs) still rely on fixed-patch Vision Transformers (ViTs) for their vision capabilities, despite the existence of more efficient tokenization methods. It explores potential reasons for this, such as marginal gains, pipeline limitations, or unclear scaling laws for adaptive patching.

VLMs deep learning Vision Transformers Tokenization

RESEARCHarXiv CS.LG·4/20/2026

Dispatch-Aware Ragged Attention for Pruned Vision Transformers

This paper investigates the dispatch-overhead bottleneck that prevents token pruning from fully realizing latency reductions in Vision Transformers (ViTs). It proposes a lightweight Triton attention kernel with a lower dispatch floor, achieving up to 2.24x end-to-end throughput for pruned ViTs.

AI models deep learning Performance optimization attention mechanisms