attention mechanisms

28 items

NEWS↑ trendingReddit r/LocalLLaMA·4/22/2026

Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20

Moonshot AI has open-sourced FlashKDA, a CUTLASS C++ kernel for Kimi Delta Attention, offering up to 2.22x performance improvement over the Triton baseline on H20 benchmarks. This new implementation integrates with flash-linear-attention and enhances linear attention architectures like KDA.

Open Source deep learning Performance optimization attention mechanisms

Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20

RESEARCH↑ trendingReddit r/MachineLearning·27d ago

Elastic Attention Cores for Scalable Vision Transformers [R]

This paper introduces Elastic Attention Cores as a new building block for scalable Vision Transformers, addressing the high cost of dense self-attention. The approach uses a core-periphery block-sparse attention structure and nested dropout for elastic inference cost adjustments, achieving competitive accuracy.

deep learning computer vision attention mechanisms Vision Transformers

Elastic Attention Cores for Scalable Vision Transformers [R]

ARTICLE↑ trendingReddit r/LocalLLaMA·4/24/2026

Takeaways & discussion about the DeepSeek V4 architecture

This article discusses the architectural novelties of DeepSeek V4, highlighting its hybrid attention system (CSA + HCA) and Manifold-Constrained Hyper-Connections. It also touches on frontier-scale FP4 QAT training, differentiating it from previous models.

DeepSeek deep learning attention mechanisms quantization

RESEARCHarXiv CS.LG·4/20/2026

Dispatch-Aware Ragged Attention for Pruned Vision Transformers

This paper investigates the dispatch-overhead bottleneck that prevents token pruning from fully realizing latency reductions in Vision Transformers (ViTs). It proposes a lightweight Triton attention kernel with a lower dispatch floor, achieving up to 2.24x end-to-end throughput for pruned ViTs.

AI models deep learning Performance optimization attention mechanisms

RESEARCHarXiv CS.LG·4/21/2026

UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration

UniMamba is a new unified spatial-temporal forecasting framework that integrates efficient state-space dynamics with attention-based dependency learning to tackle multivariate time series challenges. It employs a Mamba Variate-Channel Encoding Layer and a Spatial Temporal Attention Layer to capture both global temporal dependencies and inter-variate correlations.

forecasting machine learning attention mechanisms State Space Models

RESEARCHDEV.to AI·3d ago

Aligning where to see and what to tell: image caption with region-basedattention and scene factorization

This work introduces a method for image caption generation, utilizing region-based attention and scene factorization to enhance descriptive relevance and accuracy. It aims to more effectively align visual perception with textual narration.

scene understanding deep learning computer vision attention mechanisms

RESEARCHarXiv CS.LG·5d ago

Do Transformers Need Three Projections? Systematic Study of QKV Variants

This research systematically evaluates variants of the Query, Key, and Value (QKV) attention formulation in Transformers, including shared key-value, query-key, and single projections. Experiments across synthetic, vision, and language modeling tasks demonstrate that these alternative formulations perform on par or occasionally better than standard QKV Transformers, with Q-K=V sharing offering significant KV cache reduction in language modeling.

QKV computer vision attention mechanisms Language modeling

RESEARCHarXiv CS.CL·4/27/2026

Shared Lexical Task Representations Explain Behavioral Variability In LLMs

This research investigates LLM prompt sensitivity by comparing instruction-based and example-based prompting styles. It finds that despite performance variation, LLMs share common underlying mechanisms, specifically "lexical task heads" which are attention heads that literally describe the task and trigger answer production.

model interpretability LLMs prompt engineering attention mechanisms

RESEARCHarXiv CS.LG·4/14/2026

The Diffusion-Attention Connection

This research unifies Transformers, diffusion-maps, and magnetic Laplacians, presenting them as different regimes of a single Markov geometry built from pre-softmax query-scores. It defines a QK "bidivergence" to connect attention and diffusion, organizing their dynamics with product of experts and Schrödinger-bridges.

Diffusion Models Deep Learning Theory Markov Geometry attention mechanisms

RESEARCHarXiv CS.CL·4/7/2026

LPC-SM: Local Predictive Coding and Sparse Memory for Long-Context Language Modeling

Este artigo propõe LPC-SM, uma arquitetura híbrida autorregressiva para modelos de linguagem de contexto longo, que separa atenção local, memória persistente, correção preditiva e controle em tempo de execução. O modelo de 158M parâmetros é avaliado, demonstrando melhorias na perda de LM e estabilidade em sequências longas.

neural networks language models Long Context attention mechanisms

RESEARCHarXiv CS.LG·4/23/2026

Super Apriel: One Checkpoint, Many Speeds

Super Apriel, a 15B-parameter supernet, has been released, offering four trained mixer choices per decoder layer to enable multiple speed/quality presets from a single checkpoint. This allows for 2.9x to 10.7x decode throughput gains with 96% to 77% quality retention, and also facilitates speculative decoding without a separate draft model.

neural network architecture Performance optimization attention mechanisms large language models

RESEARCHarXiv CS.AI·4/20/2026

LACE: Lattice Attention for Cross-thread Exploration

LACE is a novel framework enabling Large Language Models (LLMs) to coordinate and share insights across multiple parallel reasoning paths through cross-thread attention. It leverages a synthetic data pipeline to teach collaborative error-correction, leading to over 7 points improvement in reasoning accuracy.

synthetic data LLMs attention mechanisms AI Reasoning

RESEARCHarXiv CS.AI·5/7/2026

ANDRE: An Attention-based Neuro-symbolic Differentiable Rule Extractor

This paper introduces ANDRE, a novel Attention-based Neuro-symbolic Differentiable Rule Extractor (ILP) framework for learning first-order logic programs. It optimizes over a continuous rule space with fully differentiable, attention-driven logical operators, addressing scalability challenges in noisy and probabilistic settings.

machine learning attention mechanisms Logic Programming Inductive Logic Programming

RESEARCHDEV.to AI·5/5/2026

Robust Invisible Video Watermarking with Attention

This research presents a novel robust invisible video watermarking method that leverages attention mechanisms to enhance imperceptibility and resilience against attacks.

robustness video watermarking deep learning security

ARTICLEDEV.to AI·4/19/2026

Attention Mechanisms: Stop Compressing, Start Looking Back

This article delves into the limitations of LSTMs in maintaining context, even with their improved memory capabilities over vanilla RNNs. The author uses a personal experience of learning English to illustrate the three specific problems LSTMs still don't solve, setting the stage for discussing attention mechanisms.

deep learning attention mechanisms natural language processing

RESEARCHDEV.to AI·5/8/2026

Tiny weight edits improve LLM safety

Targeted, tiny weight edits to specific attention heads in LLMs, as demonstrated by the ASGuard method, can drastically reduce jailbreak success rates from linguistic tricks. This surgical approach patches vulnerabilities by dampening activations in relevant attention heads, maintaining overall model competence while significantly enhancing safety.

AI models jailbreaking security LLM safety

RESEARCHDEV.to AI·5/10/2026

Neural Language Correction with Character-Based Attention

This research introduces a novel approach to neural language correction leveraging character-based attention mechanisms. The method aims to improve the accuracy and robustness of automatically correcting grammatical and spelling errors in text.

neural networks deep learning attention mechanisms natural language processing

RESEARCHarXiv CS.CL·4/27/2026

Where Should LoRA Go? Component-Type Placement in Hybrid Language Models

This research systematically investigates LoRA placement in hybrid language models, which combine attention and recurrent components. It finds that adapting the attention pathway consistently outperforms full-model adaptation with significantly fewer parameters, while the effect of adapting the recurrent backbone varies drastically depending on the hybrid architecture (sequential vs. parallel).

hybrid language models model adaptation attention mechanisms Recurrent Neural Networks

RESEARCHarXiv CS.LG·4/27/2026

LayerBoost: Layer-Aware Attention Reduction for Efficient LLMs

LayerBoost proposes an optimization for LLMs by selectively modifying the attention mechanism based on the sensitivity of individual transformer layers. This aims to reduce the quadratic complexity of softmax attention, a major bottleneck for efficient inference, without significant model quality degradation.

LLMs AI optimization attention mechanisms Transformers

RESEARCHarXiv CS.LG·4/24/2026

Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention

This paper introduces Gist Sparse Attention (GSA), an end-to-end learnable method to scale large language models to long contexts without architectural modifications. GSA compresses context into 'gist tokens' for summary, then selectively restores relevant raw chunks for detailed attention, combining compact global representations with targeted fine-grained access.

neural networks model efficiency attention mechanisms large language models