← heapsort-ai

deep learning

263 items

ARTICLEDEV.to AI·27d ago

Lambda — Deep Dive

Lambda is a specialized AI infrastructure provider focused on GPU compute and machine learning tooling, carving a critical niche in the AI hardware landscape. Unlike generalist hyperscalers, the company's mission is to enable seamless transitions from prototypes to massive production workloads for its diverse customer base.

29
RESEARCHarXiv CS.LG·4/28/2026

Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing

This work addresses the significant memory footprint of Key-Value (KV) caching in transformer language models, proposing optimization through the depth dimension. It introduces a method for cross-layer cache sharing, demonstrating that dropping a layer's cache can be efficient without information loss, and suggests a training approach with random cross-layer attention.

29
RESEARCHarXiv CS.LG·4/28/2026

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry

This systematic study of singular value spectra during transformer pretraining reveals three key phenomena: transient compression waves propagating through layers and persistent spectral gradients. It also identifies a Q/K–V functional asymmetry, where query/key projections drive depth-dependent dynamics while value/output projections compress uniformly.

29
RESEARCHarXiv CS.LG·27d ago

Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation

This paper introduces the Convolutional Variational Deep Embedding (Conv-VaDE) model for EEG microstate analysis. It enhances interpretability by jointly learning topographic reconstruction and probabilistic soft clustering, enabling generative decoding of cluster prototypes into verifiable scalp topographies.

29
RESEARCHarXiv CS.LG·20d ago

Simply Stabilizing the Loop via Fully Looped Transformer

Looped Transformers provide a way to improve model performance by iteratively reusing blocks without increasing parameter count, but they suffer from training instability at higher loop iterations. This instability is attributed to gradient oscillation and residual explosion, leading to the proposal of the Fully Looped Transformer, which introduces a Fully Looped Architecture and Attention Injection.

29
DOCDEV.to AI·4d ago

<think>

This content details the Global API service, offering access to 184 AI models with competitive pricing, such as DeepSeek V4 Flash at $0.25/M and GPT-4o. It highlights features like a 99.9% SLA, 50 free requests per minute, and never-expiring credits, alongside Pro Channel options for advanced needs.

28