attention mechanisms

28 items

RESEARCHarXiv CS.LG·4/21/2026

CGCMA: Conditionally-Gated Cross-Modal Attention for Event-Conditioned Asynchronous Fusion

This paper studies asynchronous alignment in multimodal learning, where a dense primary stream must be fused with sporadic external context, requiring models to reason explicitly about freshness and trust. It proposes CGCMA (Conditionally-Gated Cross-Modal Attention), a model that separates text-conditioned grounding from lag-aware trust control, tested on cryptocurrency markets.

multimodal AI machine learning attention mechanisms Time Series Analysis

RESEARCHarXiv CS.AI·29d ago

Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits

This research tests the "Attention-Confidence Assumption" in Vision-Language Models (VLMs), finding that attention structure is a near-zero predictor of correctness. The study uses a unified mechanistic pipeline (VLM Reliability Probe) to analyze attention, generation dynamics, and hidden-state geometry in three VLM families.

Vision-Language Models Mechanistic Interpretability attention mechanisms AI reliability

RESEARCHarXiv CS.CL·8d ago

AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection

This paper introduces AEyeDE, an attention-driven framework for human-AI authorship detection that leverages model attention as a discriminative signal. The method consistently outperforms text-only baselines and shows robustness across various text generation settings, remaining competitive on standard benchmarks.

AI detection machine learning NLP attention mechanisms

RESEARCHarXiv CS.AI·13d ago

LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

LaneRoPE is a novel technique designed to enhance parallel Large Language Model (LLM) generation by enabling coordination and collaboration among multiple sequences at test time. It achieves this through an inter-sequence attention mask and a RoPE extension that injects positional information, demonstrating promising results on mathematical reasoning tasks.

mathematical reasoning attention mechanisms Positional Encoding Parallel Processing

RESEARCHarXiv CS.LG·5/6/2026

On the Invariants of Softmax Attention

This research defines the "energy field" in softmax attention, revealing essential invariant properties. It distinguishes between mechanism-level invariants, derived from algebraic structure, and model-level regularities observed in autoregressive language models.

neural networks softmax machine learning NLP

RESEARCHarXiv CS.CL·4/7/2026

Why Attend to Everything? Focus is the Key

Este artigo apresenta o Focus, um método inovador que aprende quais pares de tokens são relevantes em mecanismos de atenção, em vez de aproximar todos. Ele melhora a perplexidade do domínio e oferece até 2x de aceleração na inferência, superando a atenção completa em diversas escalas e arquiteturas.

retrofit setting neural networks Focus method Perplexity

RESEARCHarXiv CS.CL·5/6/2026

How Language Models Process Negation

This study investigates how Large Language Models (LLMs) mechanistically process negation, revealing that even open-weight models possess internal components for correct negation processing despite often providing wrong answers. Their poor accuracy is attributed to late-layer attention promoting simple shortcuts, and models implement both attending to negated phrases and directly constructing negative phrase representations.

LLMs Mechanistic Interpretability attention mechanisms Natural Language Processing

DOCStatQuest (YouTube)·2/12/2025

StatQuest on DeepLearning.AI!!! Check out my short course on attention!

StatQuest has launched a short course on attention mechanisms on the DeepLearning.AI platform. The course aims to teach the fundamentals and applications of this important artificial intelligence technique.

deep learning learning attention mechanisms

StatQuest on DeepLearning.AI!!! Check out my short course on attention!