← heapsort-ai

Diffusion Models

41 items

RESEARCHarXiv CS.LG·20h ago

Enabling KV Caching of Shared Prefix for Diffusion Language Models

The paper introduces "bicache", the first KV caching technique for shared prefixes in diffusion language models (DLMs), addressing challenges where existing LLM caching methods fail due to DLMs' bidirectional attention. This new approach aims to unlock high-throughput DLM serving by leveraging observations about shared prefix KVs stability in shallow layers.

54
RESEARCH↑ trendingReddit r/LocalLLaMA·4/10/2026

National University of Singapore Presents "DMax": A New Paradigm For Diffusion Language Models (dLLMs) Enabling Aggressive Parallel Decoding.

DMax é um novo paradigma para modelos de linguagem de difusão (dLLMs) eficientes que mitiga o acúmulo de erros na decodificação paralela. Ele permite um paralelismo agressivo ao reformular a decodificação como um processo de auto-refinamento progressivo e introduzir uma estratégia de treinamento unificada.

44
RESEARCHarXiv CS.LG·1d ago

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

Diffusion Large Language Models (dLLMs) face a "stability lag" due to irreversible token commitment, a problem exacerbated by Post-Training Quantization (PTQ) errors. FAIR-Calib proposes a two-stage PTQ framework that uses a position prior and layer-wise calibration to protect fragile frontier states, enhancing quantization for dLLMs.

40
ARTICLEDEV.to AI·4/22/2026

The Unfinished Frame

The author explores the beauty and honesty of pausing diffusion models mid-render, finding these unfinished frames more revealing than polished final images. These stages, where AI models are still "thinking" and negotiating features from their training data, are described as a "confession" rather than a "statement."

34
RESEARCHarXiv CS.LG·4/22/2026

Discrete Tilt Matching

Discrete Tilt Matching (DTM) is a novel likelihood-free method for fine-tuning masked diffusion large language models (dLLMs), addressing the intractability of sequence-level marginal likelihoods in RL. It recasts fine-tuning as state-level matching, using a weighted cross-entropy objective with control variates for stability, and achieves strong results on various tasks like Sudoku and Countdown.

30
RESEARCHarXiv CS.CL·4/13/2026

Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models

This paper reveals a critical vulnerability in diffusion-based language models (dLLMs) where their safety alignment, based on monotonic denoising schedules, can be easily bypassed. By re-masking refusal tokens and injecting an affirmative prefix, researchers achieved high attack success rates against prominent dLLMs, exposing a structural flaw.

29
RESEARCHarXiv CS.LG·19d ago

Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine

This paper provides a theoretical explanation for the efficiency of diffusion models in learning the score function for high-dimensional data supported on low-dimensional manifolds. It identifies a "collapse-and-refine" mechanism driven by the geometry of the score function, where the denoising map projects onto the data manifold and refines the intrinsic density.

29
RESEARCHarXiv CS.LG·27d ago

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

This paper investigates the limitations of uniform interventions in discrete diffusion language models (DLMs), demonstrating they degrade controlled generation quality. The authors find that different attributes commit at distinct stages of the denoising process, proposing an adaptive scheduler to concentrate interventions efficiently.

28
RESEARCHarXiv CS.CL·12d ago

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

FLUID is a new framework designed to efficiently adapt Autoregressive (AR) backbones to the diffusion paradigm for parallel text generation. It enables initialization from GPT-style models and introduces an entropy-driven mechanism called Elastic Horizons, achieving state-of-the-art performance with significantly reduced training costs.

28
RESEARCHarXiv CS.LG·4/6/2026

Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models

Este trabalho explora o agendamento de modelos para acelerar os Modelos de Linguagem de Difusão Mascarada (MDLMs), substituindo o modelo completo por um menor em certas etapas de denoising. A pesquisa mostra que as etapas iniciais e finais são mais robustas a essa substituição, permitindo uma redução de até 17% nos FLOPs com degradação mínima na perplexidade generativa.

28
RESEARCHarXiv CS.CL·15d ago

Learnability-Informed Fine-Tuning of Diffusion Language Models

This research introduces LIFT, a learnability-informed fine-tuning algorithm designed to enhance the reasoning capabilities of diffusion language models. LIFT addresses the shortcomings of standard SFT by adaptively learning tokens based on their difficulty and available context during different diffusion time steps, showing improved performance over existing baselines.

28