deep learning

263 items

ARTICLEDEV.to AI·4/19/2026

Attention Mechanisms: Stop Compressing, Start Looking Back

This article delves into the limitations of LSTMs in maintaining context, even with their improved memory capabilities over vanilla RNNs. The author uses a personal experience of learning English to illustrate the three specific problems LSTMs still don't solve, setting the stage for discussing attention mechanisms.

deep learning Attention Mechanisms Natural Language Processing

RESEARCHDEV.to AI·24d ago

Deep Neural Networks for Survival Analysis Based on a Multi-Task Framework

This research explores the application of deep neural networks in survival analysis, employing a multi-task framework. The approach aims to enhance the prediction and modeling of time-to-event data by leveraging complex neural network architectures.

neural networks multi-task learning deep learning survival analysis

RESEARCHDEV.to AI·5/10/2026

Neural Language Correction with Character-Based Attention

This research introduces a novel approach to neural language correction leveraging character-based attention mechanisms. The method aims to improve the accuracy and robustness of automatically correcting grammatical and spelling errors in text.

neural networks deep learning Attention Mechanisms Natural Language Processing

RESEARCHDEV.to AI·4/27/2026

An Attention Free Transformer

This content introduces the concept of an Attention Free Transformer, a novel architectural design aiming to achieve the capabilities of traditional Transformers without relying on the self-attention mechanism. It likely explores alternative mechanisms for contextual information processing in sequence-to-sequence tasks.

neural networks deep learning AI Architectures Transformers

RESEARCHarXiv CS.LG·4/15/2026

Thermodynamic Liquid Manifold Networks: Physics-Bounded Deep Learning for Solar Forecasting in Autonomous Off-Grid Microgrids

This research introduces the Thermodynamic Liquid Manifold Network (TLMN), a physics-bounded deep learning model for solar forecasting in autonomous off-grid microgrids. It resolves critical anomalies in contemporary deep learning models by integrating atmospheric thermodynamics and celestial mechanics to prevent physically impossible predictions.

microgrids deep learning Solar Forecasting Thermodynamics

RESEARCHarXiv CS.LG·4/15/2026

Uncertainty Quantification in CNN Through the Bootstrap of Convex Neural Networks

This paper proposes a novel bootstrap-based framework for uncertainty quantification (UQ) in Convolutional Neural Networks (CNNs), addressing the lack of theoretically consistent UQ tools. The method utilizes convexified neural networks to establish theoretical consistency, offers significantly less computational load, and explores a novel transfer learning approach.

Theoretical Consistency Bootstrap deep learning uncertainty quantification

RESEARCHarXiv CS.AI·4/25/2026

Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations

This work introduces an innovative framework for adaptive test-time compute allocation, jointly adjusting where computation is spent and how generation is performed. The method uses a warm-up phase to identify easy queries and then concentrates further computation on unresolved queries, reshaping generation distributions with evolving in-context demonstrations.

deep learning Machine Learning in-context learning AI

RESEARCHarXiv CS.LG·4/28/2026

AutoCompress: Critical Layer Isolation for Efficient Transformer Compression

AutoCompress is a transformer compression method based on the empirical finding that Layer 0 carries disproportionately high task-critical information. Its Critical Layer Isolation (CLI) architecture achieves 2.47x compression on GPT-2 Medium with 59.5% parameter reduction, significantly outperforming a uniform bottleneck baseline.

AI architecture model efficiency deep learning GPT-2

RESEARCHarXiv CS.LG·5/5/2026

Fast Log-Domain Sinkhorn Optimal Transport with Warp-Level GPU Reductions

This paper introduces FastSinkhorn, a native CUDA implementation of the log-domain Sinkhorn algorithm that provides faster and more stable solutions for optimal transport (OT) problems. It achieves a 12x speedup over the POT library and 5.9x over GPU-accelerated PyTorch baselines, maintaining numerical stability for small regularization parameters.

GPU computing deep learning Sinkhorn Algorithm Numerical Stability

RESEARCHarXiv CS.CL·5/1/2026

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

This paper introduces the Length Value Model (LenVM), a novel token-level framework for modeling the remaining generation length in autoregressive models. By formulating length modeling as a value estimation problem, LenVM provides an annotation-free, scalable, and effective signal for LLMs and VLMs, improving performance on exact length matching tasks.

deep learning Model Architecture computer vision large language models

RESEARCHarXiv CS.LG·4/27/2026

LTBs-KAN: Linear-Time B-splines Kolmogorov-Arnold Networks

LTBs-KAN is a novel neural network architecture designed to overcome the computational slowness of traditional KANs by offering linear complexity and reduced parameters. Experiments demonstrate significant improvements in computational efficiency and parameter reduction on common datasets like MNIST, Fashion-MNIST, and CIFAR-10.

neural networks B-splines deep learning Computational Efficiency

RESEARCHarXiv CS.LG·5/1/2026

Simple Self-Conditioning Adaptation for Masked Diffusion Models

Masked diffusion models (MDMs) discard clean-state predictions for tokens that remain masked, limiting cross-step refinement. This paper proposes Self-Conditioned Masked Diffusion Models (SCMDM), a post-training adaptation that conditions each denoising step on the model's own previous clean-state predictions. This enhances performance without significant architectural changes or extra evaluations.

Diffusion Models model adaptation deep learning Machine Learning

RESEARCHarXiv CS.LG·4/27/2026

Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning

This research investigates the necessity of learned memory tokens as a computational scratchpad for Universal Transformers with Adaptive Computation Time (ACT) on a combinatorial reasoning benchmark, Sudoku-Extreme. It finds that memory tokens are empirically necessary for non-trivial performance, identifying a sharp lower threshold for optimal count and a common router initialization trap.

neural networks deep learning memory Reasoning

RESEARCHarXiv CS.LG·5/8/2026

Are Flat Minima an Illusion?

This paper challenges the conventional view that flat minima inherently lead to better generalization, showing that function-preserving reparameterization can drastically alter a minimum's perceived sharpness. It introduces "weakness"—a reparameterization-invariant measure based on what the network does—as the actual driver of generalization, proving its minimax optimality and correlation with PAC-Bayes bounds.

neural networks Optimization Generalization Machine Learning Theory

RESEARCHDEV.to AI·4/8/2026

Neural Models for Information Retrieval

Este conteúdo aborda o uso de modelos neurais para aprimorar os sistemas de recuperação de informação. Explora como a inteligência artificial pode otimizar a busca e organização de grandes volumes de dados.

neural networks deep learning Machine Learning Information Retrieval

RESEARCHarXiv CS.LG·4/16/2026

Spectral Entropy Collapse as an Empirical Signature of Delayed Generalisation in Grokking

This paper identifies normalized spectral entropy as a scalar order parameter for the grokking transition, where models generalize long after memorization. The research shows that entropy collapse precedes generalization, and causal interventions confirm its critical role, providing a predictive model for grokking onset.

neural networks grokking Generalization deep learning

RESEARCHarXiv CS.LG·4/8/2026

El Nino Prediction Based on Weather Forecast and Geographical Time-series Data

Este artigo propõe um novo framework para aprimorar a previsão de eventos El Niño, integrando dados de previsão do tempo e geográficos. Ele utiliza uma arquitetura híbrida de deep learning, combinando CNN para extração espacial e LSTM para modelagem temporal, visando identificar precursores complexos.

CNN deep learning Weather Forecasting El Nino Prediction

RESEARCHarXiv CS.LG·4/17/2026

Towards Verified and Targeted Explanations through Formal Methods

This paper introduces ViTaX, a formal XAI framework designed to generate targeted semifactual explanations with mathematical guarantees. It addresses the shortcomings of existing XAI methods in providing trustworthy explanations for deep neural networks in safety-critical domains like autonomous driving and medical diagnosis.

deep learning formal methods Explainable AI Safety-Critical Systems

RESEARCHarXiv CS.CL·4/17/2026

Can Large Language Models Detect Methodological Flaws? Evidence from Gesture Recognition for UAV-Based Rescue Operation Based on Deep Learning

This research investigates whether Large Language Models (LLMs) can identify methodological flaws, such as data leakage, in published machine learning studies. A case study showed six state-of-the-art LLMs consistently detected evaluation flaws in a gesture recognition paper due to non-independent data partitioning.

deep learning Machine Learning large language models AI evaluation

RESEARCHarXiv CS.CL·20d ago

Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues

The paper proposes a Forward-Focused Bidirectional Pseudo-Siamese Network (FF-BPSN) for planning dialogue paths in target-oriented proactive dialogue systems. This network uses identical transformer-based decoders for bidirectional planning and integrates information to construct a forward path, guiding language models in response generation.

transformer networks deep learning NLP AI