RESEARCH28
The Diffusion-Attention Connection
arXiv CS.LG·April 14, 2026
This research unifies Transformers, diffusion-maps, and magnetic Laplacians, presenting them as different regimes of a single Markov geometry built from pre-softmax query-scores. It defines a QK "bidivergence" to connect attention and diffusion, organizing their dynamics with product of experts and Schrödinger-bridges.
Read original ↗