← heapsort-ai

deep learning

263 items

RESEARCHarXiv CS.CL·4/21/2026

Brain-CLIPLM: Decoding Compressed Semantic Representations in EEG for Language Reconstruction

This work proposes a semantic compression hypothesis to overcome limitations in EEG-to-text decoding, suggesting that EEG signals encode compressed semantic anchors rather than full linguistic structure. It introduces Brain-CLIPLM, a two-stage framework for semantic anchor extraction via contrastive learning and sentence reconstruction using a retrieval-grounded large language model.

27
RESEARCHarXiv CS.LG·5/4/2026

Cloud Is Closer Than It Appears: Revisiting the Tradeoffs of Distributed Real-Time Inference

This paper re-examines the viability of cloud-based inference for latency-sensitive cyber-physical systems, challenging the assumption that on-device processing is always superior. It demonstrates that high-throughput cloud platforms can match or surpass on-device performance for real-time control tasks by amortizing network and queueing delays.

27
RESEARCHarXiv CS.LG·5/7/2026

Continual Distillation of Teachers from Different Domains

This research introduces Continual Distillation (CD), a new paradigm where a student model sequentially learns from a stream of teacher models without retaining prior access. It addresses challenges like unseen knowledge transfer (UKT) and forgetting (UKF) through Self External Data Distillation (SE2D), which uses external unlabeled data to stabilize learning across heterogeneous teachers.

27
RESEARCHarXiv CS.LG·4/21/2026

BASIS: Balanced Activation Sketching with Invariant Scalars for "Ghost Backpropagation"

This paper introduces BASIS, an efficient backpropagation algorithm designed to mitigate the O(L * BN) spatial memory bottleneck in deep neural networks. It fully decouples activation memory from batch and sequence dimensions, preserving exact error signals while computing weight updates with massively compressed tensors, and addresses gradient instability with novel mechanisms.

27
RESEARCHarXiv CS.LG·28d ago

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

This empirical study investigates Tian's (2025) feature repulsion theorem in two-layer network grokking, testing its mechanisms and spectral signatures. It observes a clear structure-mechanism dissociation, with the predicted sign rule robustly holding for similar feature pairs despite a strong activation dependence in the spectral signature.

27
RESEARCHarXiv CS.LG·7d ago

Hoeffding Concept Bottleneck Models with Applications to Overhead Images

Hoeffding Concept Bottleneck Models (HCBM) are introduced to offer non-linear and sparse aggregations of concept scores, enhancing the explainability and accuracy of deep learning predictions. This method leverages Hoeffding functional decomposition of gradient-boosted trees to overcome the limitations of existing linear CBMs, which suffer from a large number of concepts and potential information leakage.

27
RESEARCHarXiv CS.AI·24d ago

Conditional Attribute Estimation with Autoregressive Sequence Models

This research introduces Conditional Attribute Transformers, a novel method for jointly estimating next-token probability and an attribute's value conditional on each potential next token selection. This framework enables critical capabilities like per-token credit assignment and counterfactual analysis within a single forward pass, overcoming limitations of traditional generative models.

27
RESEARCHarXiv CS.LG·28d ago

Distributional Reinforcement Learning via the Cram\'er Distance

This paper introduces the Cramér-based Distributional Soft Actor-Critic (C-DSAC) algorithm, applying Soft Actor-Critic within a distributional reinforcement learning framework by minimizing the squared Cramér distance. Empirical results demonstrate that C-DSAC outperforms baseline SAC and other distributional methods, particularly in high-complexity environments, attributed to its confidence-driven Q-value updates.

27
RESEARCHarXiv CS.LG·5/7/2026

A Self-Attentive Meta-Optimizer with Group-Adaptive Learning Rates and Weight Decay

MetaAdamW is a novel optimizer that employs a self-attention mechanism to dynamically adjust per-group learning rates and weight decay, addressing the limitation of uniform hyperparameters in adaptive optimizers. Its attention module is trained via a meta-learning objective, integrating gradient alignment, loss decrease, and generalization gap.

27
RESEARCHarXiv CS.LG·5/7/2026

Lookahead Drifting Model

This paper proposes a "lookahead drifting model" for distribution mapping, which enhances image generation performance via one-step neural functional evaluation. The model computes a set of drifting terms sequentially at each training iteration, utilizing positive samples and model outputs to capture higher-order gradient information.

27
RESEARCHarXiv CS.LG·20d ago

Theory-optimal Quantization Based on Flatness

This research models the relationship between quantization error and outliers in Large Language Models (LLMs) and introduces a new metric, Flatness, to quantify outlier distribution. Based on this, it derives a theoretical optimal solution and proposes Bidirectional Diagonal Quantization (BDQ) for post-training quantization.

27
RESEARCHarXiv CS.AI·20d ago

KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition

Kolmogorov-Arnold Networks (KANs) excel at learning complex functions on clean data but struggle with noisy, real-world datasets, unlike conventional MLPs which are noise-tolerant and efficient. This paper proposes a hybrid KAN-MLP architecture for IMU-based Human Activity Recognition, strategically combining KANs for input embedding, MLPs for intermediate feature mixing, and a specialized LarctanKAN for final classification.

27