RESEARCH27
Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B
DEV.to AIΒ·May 18, 2026
This content details a three-month experiment aimed at optimizing the decode performance of the Qwen3.6-27B model on an RTX 3090 Ti GPU. The project successfully improved decoding speed from 43 to 39-49 tokens per second, leveraging a new speculative decoding technique (MTP) within llama.cpp.
Read original β