← heapsort-ai

deep learning

263 items

RESEARCHDEV.to AI·4/27/2026

An Attention Free Transformer

This content introduces the concept of an Attention Free Transformer, a novel architectural design aiming to achieve the capabilities of traditional Transformers without relying on the self-attention mechanism. It likely explores alternative mechanisms for contextual information processing in sequence-to-sequence tasks.

27
RESEARCHarXiv CS.LG·4/15/2026

Thermodynamic Liquid Manifold Networks: Physics-Bounded Deep Learning for Solar Forecasting in Autonomous Off-Grid Microgrids

This research introduces the Thermodynamic Liquid Manifold Network (TLMN), a physics-bounded deep learning model for solar forecasting in autonomous off-grid microgrids. It resolves critical anomalies in contemporary deep learning models by integrating atmospheric thermodynamics and celestial mechanics to prevent physically impossible predictions.

27
RESEARCHarXiv CS.LG·4/15/2026

Uncertainty Quantification in CNN Through the Bootstrap of Convex Neural Networks

This paper proposes a novel bootstrap-based framework for uncertainty quantification (UQ) in Convolutional Neural Networks (CNNs), addressing the lack of theoretically consistent UQ tools. The method utilizes convexified neural networks to establish theoretical consistency, offers significantly less computational load, and explores a novel transfer learning approach.

27
RESEARCHarXiv CS.AI·4/25/2026

Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations

This work introduces an innovative framework for adaptive test-time compute allocation, jointly adjusting where computation is spent and how generation is performed. The method uses a warm-up phase to identify easy queries and then concentrates further computation on unresolved queries, reshaping generation distributions with evolving in-context demonstrations.

27
RESEARCHarXiv CS.LG·5/5/2026

Fast Log-Domain Sinkhorn Optimal Transport with Warp-Level GPU Reductions

This paper introduces FastSinkhorn, a native CUDA implementation of the log-domain Sinkhorn algorithm that provides faster and more stable solutions for optimal transport (OT) problems. It achieves a 12x speedup over the POT library and 5.9x over GPU-accelerated PyTorch baselines, maintaining numerical stability for small regularization parameters.

27
RESEARCHarXiv CS.CL·5/1/2026

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

This paper introduces the Length Value Model (LenVM), a novel token-level framework for modeling the remaining generation length in autoregressive models. By formulating length modeling as a value estimation problem, LenVM provides an annotation-free, scalable, and effective signal for LLMs and VLMs, improving performance on exact length matching tasks.

27
RESEARCHarXiv CS.LG·5/1/2026

Simple Self-Conditioning Adaptation for Masked Diffusion Models

Masked diffusion models (MDMs) discard clean-state predictions for tokens that remain masked, limiting cross-step refinement. This paper proposes Self-Conditioned Masked Diffusion Models (SCMDM), a post-training adaptation that conditions each denoising step on the model's own previous clean-state predictions. This enhances performance without significant architectural changes or extra evaluations.

27
RESEARCHarXiv CS.LG·4/27/2026

Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning

This research investigates the necessity of learned memory tokens as a computational scratchpad for Universal Transformers with Adaptive Computation Time (ACT) on a combinatorial reasoning benchmark, Sudoku-Extreme. It finds that memory tokens are empirically necessary for non-trivial performance, identifying a sharp lower threshold for optimal count and a common router initialization trap.

27
RESEARCHarXiv CS.LG·5/8/2026

Are Flat Minima an Illusion?

This paper challenges the conventional view that flat minima inherently lead to better generalization, showing that function-preserving reparameterization can drastically alter a minimum's perceived sharpness. It introduces "weakness"—a reparameterization-invariant measure based on what the network does—as the actual driver of generalization, proving its minimax optimality and correlation with PAC-Bayes bounds.

27
RESEARCHarXiv CS.CL·4/17/2026

Can Large Language Models Detect Methodological Flaws? Evidence from Gesture Recognition for UAV-Based Rescue Operation Based on Deep Learning

This research investigates whether Large Language Models (LLMs) can identify methodological flaws, such as data leakage, in published machine learning studies. A case study showed six state-of-the-art LLMs consistently detected evaluation flaws in a gesture recognition paper due to non-independent data partitioning.

27