MTP

2 items

DOC↑ trendingReddit r/LocalLLaMA·19d ago

Latest b9274 Addresses MTP VRAM leak

The b9274 update addresses a VRAM leak issue in MTP (Multi-Token Prediction) models, where GPU-allocated resources were not freed on sleep/resume cycles. The fix involves explicitly resetting speculative decoder, draft context, and draft model resources in the destroy() function to prevent out-of-memory errors.

server MTP VRAM memory leak

NEWS↑ trendingReddit r/LocalLLaMA·5/4/2026

Llama.cpp MTP support now in beta!

Llama.cpp's MTP support is now in beta, initially supporting Qwen3.5 MTP, with potential for an imminent merge. This enhancement, alongside maturing tensor-parallel support, is expected to close performance gaps with vLLM, particularly in token generation speeds.

AI models Qwen3.5 MTP llama.cpp