RESEARCH27

Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

DEV.to AI·May 18, 2026

This content details a three-month experiment aimed at optimizing the decode performance of the Qwen3.6-27B model on an RTX 3090 Ti GPU. The project successfully improved decoding speed from 43 to 39-49 tokens per second, leveraging a new speculative decoding technique (MTP) within llama.cpp.

LLM optimization llama.cpp Qwen3.6-27B GPU performance Speculative Decoding

Read original ↗