← heapsort
NEWS↑ trending42

Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20

Reddit r/LocalLLaMAΒ·April 22, 2026
Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20

Moonshot AI has open-sourced FlashKDA, a CUTLASS C++ kernel for Kimi Delta Attention, offering up to 2.22x performance improvement over the Triton baseline on H20 benchmarks. This new implementation integrates with flash-linear-attention and enhances linear attention architectures like KDA.

Read original β†—