RESEARCH54

Enabling KV Caching of Shared Prefix for Diffusion Language Models

arXiv CS.LG·June 9, 2026

The paper introduces "bicache", the first KV caching technique for shared prefixes in diffusion language models (DLMs), addressing challenges where existing LLM caching methods fail due to DLMs' bidirectional attention. This new approach aims to unlock high-throughput DLM serving by leveraging observations about shared prefix KVs stability in shallow layers.

Diffusion Models KV Caching Performance optimization High-throughput serving LLM

Read original ↗