Diffusion Models

41 items

RESEARCHarXiv CS.AI·1d ago

DiBS: Diffusion-Informed Branch Selection

The paper introduces DiBS, a novel diffusion model-guided approach for branch selection in solving Sudoku, a constraint satisfaction problem. It enhances symbolic solvers by using a diffusion model to guide branch ordering, ensuring completeness while mitigating long-tail search issues.

branch selection Diffusion Models constraint satisfaction Sudoku

RESEARCHarXiv CS.CL·1d ago

Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation

This paper introduces the On-Policy Diffusion Language Model (OPDLM) for transforming autoregressive models (ARLMs) into diffusion language models (DLMs). It addresses issues like knowledge loss and train-inference mismatch by employing On-Policy Distillation (OPD).

Diffusion Models language models AI models machine learning

RESEARCHarXiv CS.LG·20h ago

Enabling KV Caching of Shared Prefix for Diffusion Language Models

The paper introduces "bicache", the first KV caching technique for shared prefixes in diffusion language models (DLMs), addressing challenges where existing LLM caching methods fail due to DLMs' bidirectional attention. This new approach aims to unlock high-throughput DLM serving by leveraging observations about shared prefix KVs stability in shallow layers.

Diffusion Models KV Caching Performance optimization High-throughput serving

RESEARCH↑ trendingReddit r/LocalLLaMA·4/10/2026

National University of Singapore Presents "DMax": A New Paradigm For Diffusion Language Models (dLLMs) Enabling Aggressive Parallel Decoding.

DMax é um novo paradigma para modelos de linguagem de difusão (dLLMs) eficientes que mitiga o acúmulo de erros na decodificação paralela. Ele permite um paralelismo agressivo ao reformular a decodificação como um processo de auto-refinamento progressivo e introduzir uma estratégia de treinamento unificada.

Diffusion Models Parallel Decoding natural language processing AI

ARTICLE↑ trendingReddit r/MachineLearning·4/21/2026

Bulding my own Diffusion Language Model from scratch was easier than I thought [P]

The author built a diffusion language model from scratch to better understand complex concepts, without the help of AI-generated code. They trained the 7.5M parameter model on the tiny Shakespeare dataset and shared the code on GitHub.

Diffusion Models language models personal-project machine learning

RESEARCHarXiv CS.LG·1d ago

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

Diffusion Large Language Models (dLLMs) face a "stability lag" due to irreversible token commitment, a problem exacerbated by Post-Training Quantization (PTQ) errors. FAIR-Calib proposes a two-stage PTQ framework that uses a position prior and layer-wise calibration to protect fragile frontier states, enhancing quantization for dLLMs.

Diffusion Models post-training quantization quantization AI calibration

ARTICLEDEV.to AI·4/22/2026

The Unfinished Frame

The author explores the beauty and honesty of pausing diffusion models mid-render, finding these unfinished frames more revealing than polished final images. These stages, where AI models are still "thinking" and negotiating features from their training data, are described as a "confession" rather than a "statement."

Diffusion Models creative process AI art AI philosophy

RESEARCHarXiv CS.CL·4/22/2026

Remask, Don't Replace: Token-to-Mask Refinement in Masked Diffusion Language Models

This paper proposes a novel technique, Token-to-Mask (T2M) remasking, to refine masked diffusion language models like LLaDA2.1. The method addresses the shortcomings of Token-to-Token (T2T) editing by resetting suspect tokens to a mask state, enabling more accurate re-prediction.

Diffusion Models language models error correction natural language processing

RESEARCHarXiv CS.LG·4/22/2026

Discrete Tilt Matching

Discrete Tilt Matching (DTM) is a novel likelihood-free method for fine-tuning masked diffusion large language models (dLLMs), addressing the intractability of sequence-level marginal likelihoods in RL. It recasts fine-tuning as state-level matching, using a weighted cross-entropy objective with control variates for stability, and achieves strong results on various tasks like Sudoku and Countdown.

Diffusion Models LLMs reinforcement learning machine learning

RESEARCHarXiv CS.CL·4/13/2026

Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models

This paper reveals a critical vulnerability in diffusion-based language models (dLLMs) where their safety alignment, based on monotonic denoising schedules, can be easily bypassed. By re-masking refusal tokens and injecting an affirmative prefix, researchers achieved high attack success rates against prominent dLLMs, exposing a structural flaw.

Diffusion Models language models vulnerability Exploitation

RESEARCHarXiv CS.LG·19d ago

Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine

This paper provides a theoretical explanation for the efficiency of diffusion models in learning the score function for high-dimensional data supported on low-dimensional manifolds. It identifies a "collapse-and-refine" mechanism driven by the geometry of the score function, where the denoising map projects onto the data manifold and refines the intrinsic density.

Diffusion Models Theoretical AI machine learning Manifold Learning

ARTICLEDEV.to AI·4/23/2026

From DALL-E to gpt-image-2: The Architectural Bet That Finally Fixed AI Text

OpenAI's new gpt-image-2 model has fundamentally solved the long-standing issue of AI models failing to accurately render text and complex layouts within images. This architectural pivot represents a significant advancement, necessitating a re-evaluation of workflows built around diffusion models.

Diffusion Models AI image generation AI architecture GPT

RESEARCHarXiv CS.LG·4/14/2026

The Diffusion-Attention Connection

This research unifies Transformers, diffusion-maps, and magnetic Laplacians, presenting them as different regimes of a single Markov geometry built from pre-softmax query-scores. It defines a QK "bidivergence" to connect attention and diffusion, organizing their dynamics with product of experts and Schrödinger-bridges.

Diffusion Models Deep Learning Theory Markov Geometry attention mechanisms

RESEARCHDEV.to AI·5/10/2026

Diffusion models approach AR quality and improve inference speed

Diffusion language models are now achieving significant throughput gains and narrowing the gap with autoregressive decoders in inference speed. New Introspective Diffusion Language Models (I-DLM) address prior issues of introspective consistency and inefficient sampling loops, improving both quality and latency.

inference speed Diffusion Models language models machine learning

RESEARCHarXiv CS.LG·21d ago

Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra

This research systematically optimizes real-time diffusion model inference on Apple M3 Ultra, exploring various techniques like CoreML conversion, quantization, and model distillation. The study achieved 22.7 FPS for 512x512 img2img transformation by combining CoreML conversion of SDXS-512 with a 3-thread camera pipeline.

Diffusion Models Optimization apple-silicon image generation

ARTICLEDEV.to AI·4/17/2026

Why Every AI Image Generator Fails at Text (And One That Finally Doesn't)

This article explores why AI image generators like Stable Diffusion and Midjourney consistently fail at rendering text correctly, explaining the issue stems from how diffusion models learn visual patterns. However, it hints at the existence of one model that has finally overcome this common limitation.

Diffusion Models AI image generation AI limitations

RESEARCHarXiv CS.LG·27d ago

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

This paper investigates the limitations of uniform interventions in discrete diffusion language models (DLMs), demonstrating they degrade controlled generation quality. The authors find that different attributes commit at distinct stages of the denoising process, proposing an adaptive scheduler to concentrate interventions efficiently.

Diffusion Models language models Controlled Generation text generation

RESEARCHarXiv CS.CL·12d ago

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

FLUID is a new framework designed to efficiently adapt Autoregressive (AR) backbones to the diffusion paradigm for parallel text generation. It enables initialization from GPT-style models and introduces an entropy-driven mechanism called Elastic Horizons, achieving state-of-the-art performance with significantly reduced training costs.

Diffusion Models text generation large language models Autoregressive Models

RESEARCHarXiv CS.LG·4/6/2026

Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models

Este trabalho explora o agendamento de modelos para acelerar os Modelos de Linguagem de Difusão Mascarada (MDLMs), substituindo o modelo completo por um menor em certas etapas de denoising. A pesquisa mostra que as etapas iniciais e finais são mais robustas a essa substituição, permitindo uma redução de até 17% nos FLOPs com degradação mínima na perplexidade generativa.

Diffusion Models language models Computational Efficiency denoising

RESEARCHarXiv CS.CL·15d ago

Learnability-Informed Fine-Tuning of Diffusion Language Models

This research introduces LIFT, a learnability-informed fine-tuning algorithm designed to enhance the reasoning capabilities of diffusion language models. LIFT addresses the shortcomings of standard SFT by adaptively learning tokens based on their difficulty and available context during different diffusion time steps, showing improved performance over existing baselines.

Diffusion Models learning machine learning natural language processing