DFlash speculative decoding on Apple Silicon : 85 tok/s, 3.3x on Qwen3.5-9B (MLX, M5 Max)
This content describes a native DFlash implementation on MLX for Apple Silicon, significantly accelerating token generation in Qwen models. The speculative decoding technique achieves speedups of up to 3.3x while maintaining identical output quality.
![Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]](/cdn-cgi/image/width=3840,quality=75,format=webp/https://preview.redd.it/vutakjb0vgyg1.png?width=140&height=59&auto=webp&s=08ecb95fd65ade25c924988f1992e9abe3d79f62)