deep learning

263 items

RESEARCHarXiv CS.LG·11d ago

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

This paper investigates the mechanistic origins of catastrophic forgetting in Large Language Models (LLMs), comparing Reinforcement Learning (RL) with Supervised Fine-Tuning (SFT). It reveals that RL preserves internal computational circuits more effectively, mitigating the forgetting of prior capabilities, unlike SFT which causes greater circuit disruption.

LLMs deep learning machine learning Catastrophic Forgetting

RESEARCHarXiv CS.CL·6d ago

Do Value Vectors in Deep Layers Need Context from the Residual Stream?

Researchers found that language model performance can significantly improve when deeper layers learn context-free value vectors, preserving original token information. This eliminates the need to recompute or persistently cache these values, as the context-dependent component provides little additional benefit.

neural networks LLMs deep learning Attention Mechanism

RESEARCHarXiv CS.LG·8d ago

Gait2Hip-60: A Unified Deep Learning Benchmark for Predicting Hip Muscle Forces and Joint Moments from Multi-Cadence Gait Kinematics

This study introduces Gait2Hip-60, a deep learning framework to predict hip muscle forces and joint moments directly from multi-cadence gait kinematics. It compares LSTM, Transformer, and Mamba models, evaluating their performance on healthy adults and an external cohort of patients.

biomechanics deep learning gait analysis musculoskeletal simulation

RESEARCHarXiv CS.AI·6d ago

Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins

This study evaluates Transformer and LSTM frameworks for streamflow inference in ungauged basins under limited hydrological information. The LSTM architecture showed stronger overall performance than the Transformer model, and incorporating downstream information further boosted performance for all models.

deep learning Environmental Modeling machine learning AI

RESEARCHarXiv CS.LG·6d ago

Geometry-Aware Tabular Diffusion

Geometry-Aware Tabular Diffusion (GATD) is introduced to improve tabular synthesis by augmenting denoisers with pairwise angles and lengths computed from column value differences. It achieves state-of-the-art performance with fewer parameters, reducing Shape and Trend error, and showing that explicit relational supervision drives the gains.

Diffusion Models data synthesis deep learning machine learning

RESEARCHarXiv CS.LG·15d ago

Tensor Cache: Eviction-conditioned Associative Memory for Transformers

This paper introduces Tensor Cache, a two-level cache for Transformers designed to optimize KV caches. It pairs sliding-window softmax attention (L1) with a fixed-size outer-product fast-weight memory (L2) to manage evicted tokens, improving access to relevant evidence outside the context window.

Associative Memory deep learning AI Caching

RESEARCHarXiv CS.LG·8d ago

Unicorn: Scaling High-Dimensional Time Series Forecasting via Universal Correlation Modeling

Unicorn is a new framework for scalable, high-dimensional time series forecasting, bridging the gap between channel-independent and channel-dependent models. It leverages a latent prototype codebook to learn universal correlation patterns, significantly outperforming state-of-the-art architectures, especially in few-shot transfer scenarios.

forecasting pretraining deep learning machine learning

RESEARCHarXiv CS.LG·15d ago

FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning

This research introduces FuRA (Full-Rank Adaptation), a novel parameter-efficient fine-tuning method that addresses limitations in existing techniques by incorporating spectral preconditioning. By reparameterizing weight matrices via full-rank Singular Value Decomposition and constraining updates, FuRA outperforms unconstrained Full Fine-Tuning while maintaining efficiency.

Optimization deep learning machine learning spectral preconditioning

RESEARCHarXiv CS.LG·12d ago

A Simple State Space Model Excels at Multivariate Time Series Classification

This research systematically studies structured state space models (SSMs) for time-series classification, comparing complex Mamba-based architectures with simpler diagonal SSMs (S4D). Surprisingly, S4D consistently outperforms Mamba variants in accuracy and efficiency on large-scale benchmarks, challenging the assumption that increased model complexity leads to better performance in this domain.

Time Series Classification deep learning machine learning Sequence Modeling

RESEARCHarXiv CS.LG·12d ago

Comparative Analysis of Liquid Neural Networks and LSTM for Sequential Pattern Recognition: Robustness, Efficiency, and Clinical Utility

Liquid Neural Networks (LNNs) model hidden state evolution as a continuous differential equation, addressing the limitations of discrete-time RNNs and LSTMs in capturing fluid temporal dynamics. This paper benchmarks LNNs against LSTMs across four sequential modalities, revealing LNNs' superior parameter efficiency and robustness, especially in native temporal domains and clinical environments.

neural networks Clinical AI deep learning machine learning

DOCDEV.to AI·4/16/2026

Understanding Transformers Part 8: Shared Weights in Self-Attention

The article explains that Transformers reuse the same set of weights for queries, keys, and values across all input words, enabling parallel computation. This reusability makes the self-attention mechanism highly efficient.

neural networks Self-Attention deep learning Parallel Computing

ARTICLEDEV.to AI·4/10/2026

Neural Machine Translation and Sequence-to-sequence Models: A Tutorial

Este tutorial aborda a tradução automática neural e os modelos sequência a sequência, explorando seus fundamentos e aplicações essenciais no campo da inteligência artificial.

Neural Machine Translation deep learning Sequence-to-sequence Models NLP

RESEARCHDEV.to AI·4/10/2026

LongLive: Real-time Interactive Long Video Generation

Este conteúdo aborda LongLive, um sistema para a geração interativa e em tempo real de vídeos longos. A tecnologia foca em produzir sequências de vídeo estendidas de forma dinâmica.

deep learning interactive AI video generation real-time AI

RESEARCHDEV.to AI·28d ago

Deep Time Series Models: A Comprehensive Survey and Benchmark

This paper offers a comprehensive survey and benchmark of deep learning models applied to time series data. It systematically reviews various architectures and their performance across different tasks and datasets.

Survey deep learning machine learning Benchmarking

RESEARCHDEV.to AI·4/27/2026

Review of Deep Learning

This content is an in-depth review of Deep Learning, exploring its fundamentals and advancements. It offers a comprehensive analysis of the techniques and applications within this field of artificial intelligence.

review deep learning AI

RESEARCHDEV.to AI·4/25/2026

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU SparseAutoencoders

This content describes research focused on improving reconstruction fidelity by utilizing a novel approach called JumpReLU SparseAutoencoders.

deep learning autoencoders machine learning

RESEARCHDEV.to AI·24d ago

Improving Deep Pancreas Segmentation in CT and MRI Images via Recurrent NeuralContextual Learning and Direct Loss Function

This paper proposes an innovative method to enhance pancreas segmentation in CT and MRI images. It utilizes recurrent neural contextual learning and a direct loss function to optimize accuracy.

CT deep learning pancreas segmentation MRI

RESEARCHDEV.to AI·4/28/2026

You Only Watch Once: A Unified CNN Architecture for Real-Time SpatiotemporalAction Localization

The title describes a unified CNN architecture for real-time spatiotemporal action localization. This work focuses on improving efficiency and accuracy in detecting activities within videos.

CNN deep learning computer vision Action Recognition

ARTICLEDEV.to AI·4/15/2026

A Modern Take on the Bias-Variance Tradeoff in Neural Networks

This article offers a modern perspective on the classical bias-variance tradeoff, re-evaluating its application and relevance in the context of contemporary neural networks. It explores how this fundamental concept manifests and impacts performance in deep learning models.

neural networks model performance deep learning machine learning

RESEARCHDEV.to AI·4/19/2026

Camera identification with deep convolutional networks

This research explores the use of deep convolutional networks for the specific task of camera identification. It delves into how these advanced AI models can differentiate between various cameras.

deep learning computer vision AI