DOCβ trending47
Latest b9274 Addresses MTP VRAM leak
Reddit r/LocalLLaMAΒ·May 21, 2026
The b9274 update addresses a VRAM leak issue in MTP (Multi-Token Prediction) models, where GPU-allocated resources were not freed on sleep/resume cycles. The fix involves explicitly resetting speculative decoder, draft context, and draft model resources in the destroy() function to prevent out-of-memory errors.
Read original β