DFlash speculative decoding on Apple Silicon : 85 tok/s, 3.3x on Qwen3.5-9B (MLX, M5 Max)
This content describes a native DFlash implementation on MLX for Apple Silicon, significantly accelerating token generation in Qwen models. The speculative decoding technique achieves speedups of up to 3.3x while maintaining identical output quality.