deep learning

263 items

RESEARCH↑ trendingReddit r/MachineLearning·5/3/2026

Struggling with Chebyshev Filter Integration in CNN — Any Advice? [R]

A user is struggling to integrate Chebyshev filters into a CNN architecture to improve performance, noting that current results are similar to baseline. They are seeking advice on filter integration, placement, tuning, and whether others have found benefits.

CNN deep learning feature extraction Chebyshev filter

NEWS↑ trendingReddit r/LocalLLaMA·4/22/2026

Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20

Moonshot AI has open-sourced FlashKDA, a CUTLASS C++ kernel for Kimi Delta Attention, offering up to 2.22x performance improvement over the Triton baseline on H20 benchmarks. This new implementation integrates with flash-linear-attention and enhances linear attention architectures like KDA.

Open Source deep learning Performance optimization attention mechanisms

Moonshot open-sourced FlashKDA, CUTLASS kernels for Kimi Delta Attention, up to 2.22x over the Triton baseline on H20

NEWS↑ trendingReddit r/MachineLearning·4/24/2026

[New Optimizer] 🌹 Rose: low VRAM, easy to use, great results, Apache 2.0 [P]

A new PyTorch optimizer named 'Rose' has been released, promising low VRAM usage, fast convergence, and excellent generalization, licensed under Apache 2.0. Developed over several years, it aims to be easy to use and more memory-efficient than 8-bit AdamW.

deep learning machine learning VRAM Optimization optimizer

DOC↑ trendingReddit r/LocalLLaMA·4/27/2026

To 16GB VRAM users, plug in your old GPU

This content suggests that users with 16GB VRAM add an old GPU (6GB+ VRAM) to increase total VRAM, enabling the execution of larger LLM models (~30b) even with a weaker secondary card. It includes a practical configuration example for `llama-server`.

deep learning GPU optimization LLM inference VRAM management

ARTICLE↑ trendingReddit r/MachineLearning·4/12/2026

LLMs learn backwards, and the scaling hypothesis is bounded. [D]

This content discusses the perspective that Large Language Models (LLMs) learn in a reverse manner and that the scalability hypothesis has inherent limits.

LLMs deep learning scaling hypothesis modelos de linguagem

ARTICLE↑ trendingReddit r/MachineLearning·4/19/2026

On the path towards a true science of deep learning [D]

A scientist with dual industry and academic affiliation shares insights on developing a fundamental scientific theory of machine learning, based on approximately seven years of work. The post outlines thoughts on achieving a true science of deep learning.

research deep learning AI Theory machine learning

RESEARCH↑ trendingReddit r/MachineLearning·27d ago

Elastic Attention Cores for Scalable Vision Transformers [R]

This paper introduces Elastic Attention Cores as a new building block for scalable Vision Transformers, addressing the high cost of dense self-attention. The approach uses a core-periphery block-sparse attention structure and nested dropout for elastic inference cost adjustments, achieving competitive accuracy.

deep learning computer vision attention mechanisms Vision Transformers

Elastic Attention Cores for Scalable Vision Transformers [R]

RESEARCH↑ trendingReddit r/MachineLearning·26d ago

Follow the Mean: Reference-Guided Flow Matching [R]

This content refers to a research paper titled "Follow the Mean: Reference-Guided Flow Matching". It explores a new methodology in generative models.

deep learning generative models machine learning Flow Matching

Follow the Mean: Reference-Guided Flow Matching [R]

ARTICLE↑ trendingReddit r/MachineLearning·4/20/2026

MILA vs Polytechnique Montreal: reapply or move on? [D]

A mechanical engineering graduate with a software development background is deciding between two professional master's paths: pursuing a minor in computer science to reapply to MILA for ML/DL, or accepting an offer from Polytechnique Montréal. The decision weighs a longer academic path to strengthen theoretical foundations against starting professional experience sooner.

education Career Development deep learning machine learning

RESEARCH↑ trendingReddit r/MachineLearning·19d ago

Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]

This discussion questions whether production Vision-Language Models (VLMs) still rely on fixed-patch Vision Transformers (ViTs) for their vision capabilities, despite the existence of more efficient tokenization methods. It explores potential reasons for this, such as marginal gains, pipeline limitations, or unclear scaling laws for adaptive patching.

VLMs deep learning Vision Transformers Tokenization

RESEARCH↑ trendingReddit r/MachineLearning·5/6/2026

Transformers with Selective Access to Early Representations [R]

The paper introduces SATFormer, a new Transformer variant that improves efficiency by allowing heads to selectively re-access early representations instead of uniformly copying them. This context-dependent gating mechanism optimizes the reuse of information, offering a better efficiency-performance trade-off.

AI architecture deep learning efficiency Transformers

Transformers with Selective Access to Early Representations [R]

RESEARCH↑ trendingReddit r/MachineLearning·4/17/2026

Low accuracy (~50%) with SSL (BYOL/MAE/VICReg) on hyperspectral crop stress data — what am I missing? [R]

The content details a persistent problem with achieving low accuracy (~50%) using self-supervised learning methods like BYOL, MAE, and VICReg for hyperspectral crop stress detection. Despite trying various techniques, performance remains barely better than random for three classes, leading to suspicions about data separability or SSL method suitability.

model performance Hyperspectral imaging deep learning self-supervised learning

NEWS↑ trendingReddit r/MachineLearning·4/26/2026

Introducing AutoMuon, a one line drop in for AdamW [P]

AutoMuon, a new Python package, enables the Muon optimizer to be used as a drop-in replacement for AdamW in PyTorch training pipelines. It automatically identifies and applies the appropriate optimizer for each model parameter, using Muon for weight matrices and AdamW for other components.

deep learning optimizer python-package PyTorch

ARTICLE↑ trendingReddit r/LocalLLaMA·4/19/2026

LLM Neuroanatomy III - LLMs seem to think in geometry, not language

This article, part of the "LLM Neuroanatomy" series, posits that Large Language Models primarily process information geometrically rather than through linguistic representations. It explores the internal mechanisms and structural organization of these advanced AI models.

AI architecture LLMs deep learning Neuroscience

LLM Neuroanatomy III - LLMs seem to think in geometry, not language

RESEARCH↑ trendingReddit r/MachineLearning·4/14/2026

"I don't know!": Teaching neural networks to abstain with the HALO-Loss. [R]

This research introduces HALO-Loss, a novel method for training neural networks to abstain from making predictions when uncertain. It allows models to express "I don't know" rather than providing potentially incorrect answers, improving reliability.

neural networks model robustness deep learning machine learning

"I don't know!": Teaching neural networks to abstain with the HALO-Loss. [R]

ARTICLE↑ trendingReddit r/LocalLLaMA·4/24/2026

Takeaways & discussion about the DeepSeek V4 architecture

This article discusses the architectural novelties of DeepSeek V4, highlighting its hybrid attention system (CSA + HCA) and Manifold-Constrained Hyper-Connections. It also touches on frontier-scale FP4 QAT training, differentiating it from previous models.

DeepSeek deep learning attention mechanisms quantization

CASE↑ trendingReddit r/MachineLearning·4/27/2026

INT8 quantization gives me better accuracy than FP16 ! [D]

A user observed that INT8 quantization in their deep learning model yielded better inference accuracy than FP16, which was unexpected. They are seeking explanations for INT8's superior performance over FP16.

inference ONNX deep learning quantization

DOC↑ trendingReddit r/MachineLearning·4/16/2026

AI for Materials Science starter kit [D]

A Deep Learning practitioner is seeking resources such as papers, courses, and tutorials to learn about AI for Materials Science. The goal is to gain sufficient knowledge to conduct meaningful research in the area and contribute to the community, with a UChicago course already identified as a benchmark.

Materials Science deep learning computational chemistry cheminformatics

DOCDEV.to AI·4/23/2026

Redes Neuronales Convolucionales - Clasificacione de imagenes Landmarks

In this video, the author explains the pipeline for training a neural network model using Convolutional Neural Networks (CNN) for landmark image classification. Those interested can follow the training pipeline and test the model on the author's Github.

neural networks deep learning image classification Convolutional Neural Networks

DOCDEV.to AI·2d ago

Pytorch for Neural Networks Part 7: Training with Loss and Derivatives

This article, part of a PyTorch series, details the neural network training process by demonstrating a nested loop structure to iterate through training data. It explains how to calculate total loss, derive output, and apply the loss function for model optimization using `loss.backward()`.

neural networks deep learning learning Training