← heapsort-ai

sequence models

6 items

RESEARCHarXiv CS.LG·7d ago

Graph Mamba Survival Analysis Based on Topology-Aware ordering

This paper addresses challenges in Whole Slide Images (WSIs) survival analysis, specifically the computational bottleneck of Transformers and Mamba's sensitivity to input order and unidirectional architecture. It proposes a novel approach to overcome Mamba's limitations in capturing topological connectivity and bidirectional spatial structures.

28
RESEARCHarXiv CS.CL·4/13/2026

EMA Is Not All You Need: Mapping the Boundary Between Structure and Content in Recurrent Context

This research explores Exponential Moving Average (EMA) traces as a minimal recurrent context to delineate the capabilities and limitations of fixed-coefficient accumulation in sequence models. It demonstrates that EMA traces excel at encoding temporal structure, matching advanced models on structural tasks, yet fundamentally fail to capture token identity, resulting in significantly reduced performance for language modeling.

27
RESEARCHarXiv CS.LG·20d ago

Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models

The paper proposes a neural framework to estimate pairwise conditional mutual information (MI) directly from the hidden states of pretrained masked diffusion models (MDMs). This method captures dependency structures and enables MI-guided parallel decoding, showing utility in Sudoku and protein sequence generation by recovering known structural constraints.

27
RESEARCHarXiv CS.AI·25d ago

Conditional Attribute Estimation with Autoregressive Sequence Models

This research introduces Conditional Attribute Transformers, a novel method for jointly estimating next-token probability and an attribute's value conditional on each potential next token selection. This framework enables critical capabilities like per-token credit assignment and counterfactual analysis within a single forward pass, overcoming limitations of traditional generative models.

27
RESEARCHarXiv CS.LG·5/11/2026

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

The Toeplitz MLP Mixer (TMM) is a new transformer-like architecture that replaces attention with triangular-masked Toeplitz matrix multiplication, significantly reducing computational complexity to O(dn log n) time and O(dn) space. TMMs demonstrate superior training efficiency and better input information retention compared to traditional transformers, despite their simpler design.

27