deep learning

263 items

RESEARCHarXiv CS.LG·8d ago

DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions

DAStatFormer is a hybrid multibranch Transformer proposed to overcome the challenges of high dimensionality and complex spatio-temporal patterns in Distributed Acoustic Sensing (DAS). It integrates compact statistical features from multiple domains, significantly reducing data size and enhancing event classification.

deep learning Machine Learning pattern recognition distributed acoustic sensing

RESEARCHarXiv CS.LG·6d ago

Self-Distilled Policy Gradient

This paper introduces Self-Distilled Policy Gradient (SDPG), a novel framework that enhances sparse-reward reinforcement learning through on-policy self-distillation. SDPG integrates group-relative verifier advantages, exact full-vocabulary self-distillation, and KL regularization, demonstrating improved stability and performance over existing baselines.

language models deep learning reinforcement learning Policy Gradient

ARTICLEDEV.to AI·4/22/2026

Why LoRA? Understanding the representative PEFT

LoRA (Low-Rank Adaptation) is introduced as the leading PEFT method, enabling efficient adaptation of massive LLMs like Llama 3 without requiring extensive hardware resources. The post promises to delve into LoRA's mathematical intuition, the concept of "intrinsic dimension," and its game-changing impact for AI engineers.

LLMs deep learning Fine-tuning PEFT

RESEARCHTogether AI Blog·4/15/2026

Parcae: Doing more with fewer parameters using stable looped models

Parcae is a stable looped language model that matches the quality of a Transformer twice its size, using fewer parameters. It introduces the first scaling laws for looping, demonstrating that increasing recurrence is a compute-efficient path to better performance.

language models deep learning efficiency model optimization

ARTICLEDEV.to AI·4/11/2026

Deep Learning on FPGAs: Past, Present, and Future

This article reviews the evolution of Deep Learning implementation on FPGAs, covering its historical development, current state, and future directions. It also highlights the critical importance of hardware acceleration for the advancement of artificial intelligence.

Hardware Acceleration FPGAs deep learning Machine Learning

ARTICLEDEV.to AI·5/1/2026

I Rebuilt Karpathy's NanoChat in JAX. Here's What XLA Gets Right and What It Gets Dead Wrong.

This content describes porting Andrej Karpathy's NanoChat from PyTorch to JAX/Flax NNX, achieving fast training on a single GPU and TPU compatibility. It details XLA's advantages in eliminating Python overhead while highlighting its limitations regarding advanced features and debugging.

deep learning XLA JAX PyTorch

RESEARCHDEV.to AI·4/20/2026

Audio Spectrogram Representations for Processing with Convolutional NeuralNetworks

This content explores audio spectrogram representations for processing with Convolutional Neural Networks. It focuses on how these visual representations of sound can be efficiently utilized by CNNs for various tasks.

deep learning audio processing Convolutional Neural Networks spectrograms

ARTICLEDEV.to AI·4/25/2026

The hidden engine behind the AI Revolution: The Transformer

The true engine behind the AI revolution is not ChatGPT, but the Transformer architecture, introduced by the "Attention Is All You Need" paper. This innovation enabled massive parallel language processing, utilizing GPUs and fundamentally changing how machines understand language.

AI history deep learning Transformer NLP

RESEARCHDEV.to AI·5/2/2026

Accelerating CNN inference on FPGAs: A Survey

This survey paper examines various techniques and methods for accelerating Convolutional Neural Network (CNN) inference specifically on Field-Programmable Gate Arrays (FPGAs). It provides an overview of existing research and architectural approaches to improve the performance and efficiency of CNN deployments on hardware.

Hardware Acceleration deep learning FPGA computer vision

RESEARCHDEV.to AI·12d ago

Graph-MLP: Node Classification without Message Passing in Graph

Graph-MLP introduces a novel approach for node classification in graph neural networks, diverging from traditional message passing mechanisms. This method aims to improve efficiency and performance by classifying nodes directly within the graph structure.

deep learning Graph Neural Networks Machine Learning Graph-MLP

RESEARCHDEV.to AI·4/26/2026

Squared Earth Mover's Distance-based Loss for Training Deep Neural Networks

This content introduces a novel loss function for training deep neural networks, based on the Squared Earth Mover's Distance. It aims to enhance the effectiveness of deep learning models by providing a more robust measure for comparing probability distributions.

neural networks deep learning Machine Learning loss functions

ARTICLEDEV.to AI·6d ago

Gemma 4 12B: Google's Encoder-Free Multimodal AI

Gemma 4 12B is Google DeepMind's open-weight, encoder-free multimodal model, processing text, images, and video in a single unified architecture. With 12 billion parameters, it excels in benchmarks and offers an efficient footprint suitable for developers and researchers.

multimodal AI deep learning Gemma 4 12B Google DeepMind

RESEARCHarXiv CS.LG·4/13/2026

Silhouette Loss: Differentiable Global Structure Learning for Deep Representations

This paper introduces Soft Silhouette Loss, a novel differentiable objective for deep learning, inspired by the classical silhouette coefficient. It aims to learn discriminative representations by enforcing intra-class compactness and inter-class separation more efficiently than existing metric learning approaches.

Classification metric learning deep learning loss functions

RESEARCHDEV.to AI·5/6/2026

Micro-Batch Training with Batch-Channel Normalization and Weight Standardization

This content explores advanced techniques for optimizing neural network training, specifically focusing on micro-batch processing. It details the application of batch-channel normalization and weight standardization to enhance model performance and stability in scenarios with small batch sizes.

neural networks batch-normalization Optimization deep learning

RESEARCHarXiv CS.LG·4/20/2026

Lightweight Geometric Adaptation for Training Physics-Informed Neural Networks

Physics-Informed Neural Networks (PINNs) often suffer from slow convergence and instability due to complex loss landscapes. This paper proposes a lightweight, curvature-aware optimization framework that augments existing first-order optimizers to improve convergence speed, training stability, and solution accuracy on partial differential equations (PDEs).

Optimization deep learning Physics-Informed Neural Networks Machine Learning

RESEARCHarXiv CS.AI·4/25/2026

HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

HypEHR is a compact Lorentzian model utilizing hyperbolic geometry to address Electronic Health Record (EHR) question answering, overcoming cost and hierarchical structure challenges of LLM-based methods. It is pretrained for next-visit diagnosis prediction and alignment with medical ontologies, achieving LLM-comparable performance with significantly fewer parameters.

Question Answering deep learning healthcare AI EHR

RESEARCHarXiv CS.LG·4/9/2026

A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset

Este artigo apresenta o AgriPriceBD, um novo conjunto de dados diário de preços de commodities agrícolas de Bangladesh, extraído com auxílio de LLM. Ele avalia sete abordagens de previsão, incluindo modelos clássicos e arquiteturas de deep learning, para estabilização da renda e segurança alimentar.

previsão de preços agrícolas deep learning Machine Learning segurança alimentar

RESEARCHarXiv CS.LG·27d ago

CAWI: Copula-Aligned Weight Initialization for Randomized Neural Networks

CAWI proposes a new weight initialization framework for Randomized Neural Networks (RdNNs) that addresses the limitation of conventional random initialization ignoring inter-feature dependence. It uses a data-fitted copula to ensure frozen projections respect empirical dependence, improving conditioning and predictive performance.

neural networks deep learning Machine Learning data science

RESEARCHarXiv CS.LG·29d ago

Geometry-free prediction of inertial lift forces in microfluidic devices using deep learning

This paper presents a novel deep learning approach for geometry-free prediction of inertial lift forces in microfluidic devices, eliminating explicit geometric parameters. The trained neural network model generalizes to unseen channel geometries while performing comparably to existing models.

neural networks deep learning microfluidics inertial lift forces

RESEARCHarXiv CS.LG·8d ago

Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) for Exponential Compression of Deep Neural Networks

This paper introduces Automatically Differentiable Nonlinear Tensor Networks (ADNTNs), a family of structured weight generators for exponential compression of Deep Neural Networks. It extends low-rank adaptation and tensor factorization by building large weight tensors through a hierarchy of small cores and nonlinear activations.

deep learning Automatic Differentiation Machine Learning Neural Network Compression