← heapsort-ai

Sinkhorn Algorithm

2 items

RESEARCHarXiv CS.LG·5/5/2026

Fast Log-Domain Sinkhorn Optimal Transport with Warp-Level GPU Reductions

This paper introduces FastSinkhorn, a native CUDA implementation of the log-domain Sinkhorn algorithm that provides faster and more stable solutions for optimal transport (OT) problems. It achieves a 12x speedup over the POT library and 5.9x over GPU-accelerated PyTorch baselines, maintaining numerical stability for small regularization parameters.

27