RESEARCH27
Fast Log-Domain Sinkhorn Optimal Transport with Warp-Level GPU Reductions
arXiv CS.LGΒ·May 5, 2026
This paper introduces FastSinkhorn, a native CUDA implementation of the log-domain Sinkhorn algorithm that provides faster and more stable solutions for optimal transport (OT) problems. It achieves a 12x speedup over the POT library and 5.9x over GPU-accelerated PyTorch baselines, maintaining numerical stability for small regularization parameters.
Read original β