DOC↑ trendingReddit r/LocalLLaMA·19d ago
Latest b9274 Addresses MTP VRAM leak
The b9274 update addresses a VRAM leak issue in MTP (Multi-Token Prediction) models, where GPU-allocated resources were not freed on sleep/resume cycles. The fix involves explicitly resetting speculative decoder, draft context, and draft model resources in the destroy() function to prevent out-of-memory errors.
47
