RESEARCH29

Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models

arXiv CS.CL·April 13, 2026

This paper reveals a critical vulnerability in diffusion-based language models (dLLMs) where their safety alignment, based on monotonic denoising schedules, can be easily bypassed. By re-masking refusal tokens and injecting an affirmative prefix, researchers achieved high attack success rates against prominent dLLMs, exposing a structural flaw.

Diffusion Models language models vulnerability Exploitation Safety Alignment

Read original ↗