RESEARCHarXiv CS.LG·4/14/2026
The Diffusion-Attention Connection
This research unifies Transformers, diffusion-maps, and magnetic Laplacians, presenting them as different regimes of a single Markov geometry built from pre-softmax query-scores. It defines a QK "bidivergence" to connect attention and diffusion, organizing their dynamics with product of experts and Schrödinger-bridges.
28