sequence models

6 items

RESEARCHarXiv CS.CL·5d ago

Generic Triple-Latent Compression with Gated Associative Retrieval

This research introduces generic triple-latent sequence models, which use a running token state and compressed pair-memory to capture higher-order token interactions. These models show improvement over a Transformer baseline on language-model benchmarks, though a retrieval extension enhances recall but is slower.

language models latent models sequence models associative retrieval

RESEARCHarXiv CS.LG·7d ago

Graph Mamba Survival Analysis Based on Topology-Aware ordering

This paper addresses challenges in Whole Slide Images (WSIs) survival analysis, specifically the computational bottleneck of Transformers and Mamba's sensitivity to input order and unidirectional architecture. It proposes a novel approach to overcome Mamba's limitations in capturing topological connectivity and bidirectional spatial structures.

deep learning survival analysis sequence models computational pathology

RESEARCHarXiv CS.CL·4/13/2026

EMA Is Not All You Need: Mapping the Boundary Between Structure and Content in Recurrent Context

This research explores Exponential Moving Average (EMA) traces as a minimal recurrent context to delineate the capabilities and limitations of fixed-coefficient accumulation in sequence models. It demonstrates that EMA traces excel at encoding temporal structure, matching advanced models on structural tasks, yet fundamentally fail to capture token identity, resulting in significantly reduced performance for language modeling.

language models Recurrent Context Temporal Structure sequence models

RESEARCHarXiv CS.LG·20d ago

Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models

The paper proposes a neural framework to estimate pairwise conditional mutual information (MI) directly from the hidden states of pretrained masked diffusion models (MDMs). This method captures dependency structures and enables MI-guided parallel decoding, showing utility in Sudoku and protein sequence generation by recovering known structural constraints.

neural networks information theory machine learning sequence models

RESEARCHarXiv CS.AI·25d ago

Conditional Attribute Estimation with Autoregressive Sequence Models

This research introduces Conditional Attribute Transformers, a novel method for jointly estimating next-token probability and an attribute's value conditional on each potential next token selection. This framework enables critical capabilities like per-token credit assignment and counterfactual analysis within a single forward pass, overcoming limitations of traditional generative models.

deep learning generative models sequence models Conditional Attribute Estimation

RESEARCHarXiv CS.LG·5/11/2026

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

The Toeplitz MLP Mixer (TMM) is a new transformer-like architecture that replaces attention with triangular-masked Toeplitz matrix multiplication, significantly reducing computational complexity to O(dn log n) time and O(dn) space. TMMs demonstrate superior training efficiency and better input information retention compared to traditional transformers, despite their simpler design.

neural networks AI architecture Computational Efficiency sequence models