RESEARCHarXiv CS.CL·4/13/2026
Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models
This paper reveals a critical vulnerability in diffusion-based language models (dLLMs) where their safety alignment, based on monotonic denoising schedules, can be easily bypassed. By re-masking refusal tokens and injecting an affirmative prefix, researchers achieved high attack success rates against prominent dLLMs, exposing a structural flaw.
29