deep learning

263 items

RESEARCHarXiv CS.LG·5/8/2026

Adaptive Computation Depth via Learned Token Routing in Transformers

This paper introduces Token-Selective Attention (TSA), a mechanism for Transformer architectures that enables adaptive computation depth per token. TSA learns to route tokens based on contextual difficulty, saving 14-23% of token-layer operations with minimal quality loss.

neural networks deep learning machine learning efficiency

RESEARCHarXiv CS.LG·4/21/2026

Preventing overfitting in deep learning using differential privacy

This research explores a differential-privacy based approach to improve generalization and prevent overfitting in Deep Neural Networks. Overfitting, where models learn noise and perform poorly on unseen data, is a growing challenge in modern AI systems.

Differential Privacy Generalization privacy deep learning

RESEARCHarXiv CS.CL·4/21/2026

Brain-CLIPLM: Decoding Compressed Semantic Representations in EEG for Language Reconstruction

This work proposes a semantic compression hypothesis to overcome limitations in EEG-to-text decoding, suggesting that EEG signals encode compressed semantic anchors rather than full linguistic structure. It introduces Brain-CLIPLM, a two-stage framework for semantic anchor extraction via contrastive learning and sentence reconstruction using a retrieval-grounded large language model.

Brain-Computer Interface (BCI)deep learning machine learning Natural Language Processing (NLP)

RESEARCHarXiv CS.LG·5/4/2026

Cloud Is Closer Than It Appears: Revisiting the Tradeoffs of Distributed Real-Time Inference

This paper re-examines the viability of cloud-based inference for latency-sensitive cyber-physical systems, challenging the assumption that on-device processing is always superior. It demonstrates that high-throughput cloud platforms can match or surpass on-device performance for real-time control tasks by amortizing network and queueing delays.

deep learning cloud computing distributed systems edge computing

RESEARCHarXiv CS.LG·5/7/2026

Continual Distillation of Teachers from Different Domains

This research introduces Continual Distillation (CD), a new paradigm where a student model sequentially learns from a stream of teacher models without retaining prior access. It addresses challenges like unseen knowledge transfer (UKT) and forgetting (UKF) through Self External Data Distillation (SE2D), which uses external unlabeled data to stabilize learning across heterogeneous teachers.

Knowledge Distillation deep learning learning Continual Learning

RESEARCHarXiv CS.LG·4/21/2026

BASIS: Balanced Activation Sketching with Invariant Scalars for "Ghost Backpropagation"

This paper introduces BASIS, an efficient backpropagation algorithm designed to mitigate the O(L * BN) spatial memory bottleneck in deep neural networks. It fully decouples activation memory from batch and sequence dimensions, preserving exact error signals while computing weight updates with massively compressed tensors, and addresses gradient instability with novel mechanisms.

neural networks deep learning Memory Optimization backpropagation

RESEARCHarXiv CS.LG·28d ago

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

This empirical study investigates Tian's (2025) feature repulsion theorem in two-layer network grokking, testing its mechanisms and spectral signatures. It observes a clear structure-mechanism dissociation, with the predicted sign rule robustly holding for similar feature pairs despite a strong activation dependence in the spectral signature.

neural networks feature learning grokking deep learning

RESEARCHarXiv CS.LG·7d ago

Hoeffding Concept Bottleneck Models with Applications to Overhead Images

Hoeffding Concept Bottleneck Models (HCBM) are introduced to offer non-linear and sparse aggregations of concept scores, enhancing the explainability and accuracy of deep learning predictions. This method leverages Hoeffding functional decomposition of gradient-boosted trees to overcome the limitations of existing linear CBMs, which suffer from a large number of concepts and potential information leakage.

deep learning machine learning computer vision Explainable AI

RESEARCHarXiv CS.AI·24d ago

Conditional Attribute Estimation with Autoregressive Sequence Models

This research introduces Conditional Attribute Transformers, a novel method for jointly estimating next-token probability and an attribute's value conditional on each potential next token selection. This framework enables critical capabilities like per-token credit assignment and counterfactual analysis within a single forward pass, overcoming limitations of traditional generative models.

deep learning generative models sequence models Conditional Attribute Estimation

RESEARCHarXiv CS.LG·4/24/2026

Validating a Deep Learning Algorithm to Identify Patients with Glaucoma using Systemic Electronic Health Records

This research validates a deep learning algorithm for glaucoma risk assessment using systemic electronic health records. The model, fine-tuned on Stanford patient data, achieved an AUROC of 0.883 and PPV of 0.657, showing strong potential for scalable and accessible pre-screening.

deep learning Medical Diagnosis healthcare AI EHR

RESEARCHarXiv CS.LG·4/24/2026

Do Masked Autoencoders Improve Downhole Prediction? An Empirical Study on Real Well Drilling Data

This study explores the application of Masked Autoencoder (MAE) pretraining for downhole drilling metric prediction, addressing the data asymmetry in drilling telemetry. Using real well drilling data, MAE reduced the test mean absolute error by 19.8% relative to supervised GRU baselines for Total Mud Volume prediction.

industrial AI deep learning machine learning

RESEARCHarXiv CS.LG·28d ago

Distributional Reinforcement Learning via the Cram\'er Distance

This paper introduces the Cramér-based Distributional Soft Actor-Critic (C-DSAC) algorithm, applying Soft Actor-Critic within a distributional reinforcement learning framework by minimizing the squared Cramér distance. Empirical results demonstrate that C-DSAC outperforms baseline SAC and other distributional methods, particularly in high-complexity environments, attributed to its confidence-driven Q-value updates.

deep learning reinforcement learning learning Algorithms

RESEARCHarXiv CS.LG·5/7/2026

A Self-Attentive Meta-Optimizer with Group-Adaptive Learning Rates and Weight Decay

MetaAdamW is a novel optimizer that employs a self-attention mechanism to dynamically adjust per-group learning rates and weight decay, addressing the limitation of uniform hyperparameters in adaptive optimizers. Its attention module is trained via a meta-learning objective, integrating gradient alignment, loss decrease, and generalization gap.

Meta-Learning deep learning learning AI Research

RESEARCHarXiv CS.LG·5/7/2026

EdgeRazor: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation

This research introduces EdgeRazor, a lightweight framework designed to deploy Large Language Models on resource-constrained devices. It leverages mixed-precision quantization-aware distillation to convert full-precision models into lower-bit formats, overcoming limitations of previous quantization methods.

LLMs deep learning quantization model optimization

RESEARCHarXiv CS.LG·5/7/2026

Lookahead Drifting Model

This paper proposes a "lookahead drifting model" for distribution mapping, which enhances image generation performance via one-step neural functional evaluation. The model computes a set of drifting terms sequentially at each training iteration, utilizing positive samples and model outputs to capture higher-order gradient information.

neural networks Optimization deep learning machine learning

RESEARCHarXiv CS.LG·29d ago

LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction

This paper introduces LKV (Learned KV Eviction), a novel approach to optimize Key-Value (KV) cache memory in Large Language Models (LLMs). LKV formulates KV compression as an end-to-end differentiable optimization problem, learning budgets and token selection to overcome limitations of heuristic methods.

deep learning Memory Optimization efficiency KV cache

RESEARCHarXiv CS.LG·22d ago

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding

This paper introduces Group-Query Latent Attention (GQLA), a modification to Multi-head Latent Attention (MLA). GQLA exposes two algebraically equivalent decoding paths, allowing a single set of trained weights to adapt efficiently to different hardware platforms like H100 and H20 without retraining.

deep learning Attention Mechanism AI Efficiency hardware optimization

RESEARCHDEV.to AI·4/25/2026

PP-LCNet: A Lightweight CPU Convolutional Neural Network

PP-LCNet introduces a lightweight convolutional neural network optimized for efficient performance on CPUs. This architecture focuses on achieving high accuracy while maintaining minimal computational demands, making it suitable for resource-constrained environments.

deep learning lightweight models computer vision Convolutional Neural Networks

RESEARCHarXiv CS.LG·20d ago

Theory-optimal Quantization Based on Flatness

This research models the relationship between quantization error and outliers in Large Language Models (LLMs) and introduces a new metric, Flatness, to quantify outlier distribution. Based on this, it derives a theoretical optimal solution and proposes Bidirectional Diagonal Quantization (BDQ) for post-training quantization.

deep learning machine learning quantization AI

RESEARCHarXiv CS.AI·20d ago

KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition

Kolmogorov-Arnold Networks (KANs) excel at learning complex functions on clean data but struggle with noisy, real-world datasets, unlike conventional MLPs which are noise-tolerant and efficient. This paper proposes a hybrid KAN-MLP architecture for IMU-based Human Activity Recognition, strategically combining KANs for input embedding, MLPs for intermediate feature mixing, and a specialized LarctanKAN for final classification.

neural networks deep learning machine learning Human Activity Recognition