← heapsort
RESEARCH27

Fast Log-Domain Sinkhorn Optimal Transport with Warp-Level GPU Reductions

arXiv CS.LGΒ·May 5, 2026

This paper introduces FastSinkhorn, a native CUDA implementation of the log-domain Sinkhorn algorithm that provides faster and more stable solutions for optimal transport (OT) problems. It achieves a 12x speedup over the POT library and 5.9x over GPU-accelerated PyTorch baselines, maintaining numerical stability for small regularization parameters.

Read original β†—