← heapsort-ai

deep learning

263 items

NEWS↑ trendingReddit r/LocalLLaMA·4/22/2026

Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20

Moonshot AI has open-sourced FlashKDA, a CUTLASS C++ kernel for Kimi Delta Attention, offering up to 2.22x performance improvement over the Triton baseline on H20 benchmarks. This new implementation integrates with flash-linear-attention and enhances linear attention architectures like KDA.

Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20
42
ARTICLE↑ trendingReddit r/MachineLearning·4/20/2026

MILA vs Polytechnique Montreal: reapply or move on? [D]

A mechanical engineering graduate with a software development background is deciding between two professional master's paths: pursuing a minor in computer science to reapply to MILA for ML/DL, or accepting an offer from Polytechnique Montréal. The decision weighs a longer academic path to strengthen theoretical foundations against starting professional experience sooner.

42
RESEARCH↑ trendingReddit r/MachineLearning·19d ago

Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]

This discussion questions whether production Vision-Language Models (VLMs) still rely on fixed-patch Vision Transformers (ViTs) for their vision capabilities, despite the existence of more efficient tokenization methods. It explores potential reasons for this, such as marginal gains, pipeline limitations, or unclear scaling laws for adaptive patching.

42
RESEARCH↑ trendingReddit r/MachineLearning·4/17/2026

Low accuracy (~50%) with SSL (BYOL/MAE/VICReg) on hyperspectral crop stress data — what am I missing? [R]

The content details a persistent problem with achieving low accuracy (~50%) using self-supervised learning methods like BYOL, MAE, and VICReg for hyperspectral crop stress detection. Despite trying various techniques, performance remains barely better than random for three classes, leading to suspicions about data separability or SSL method suitability.

42