← heapsort
NEWS↑ trending42

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Reddit r/LocalLLaMAΒ·April 27, 2026
Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Luce DFlash introduces a GGUF port of DFlash speculative decoding for Qwen3.6-27B, achieving nearly 2x throughput on a single RTX 3090. This standalone C++/CUDA stack, available as an MIT-licensed open-source project, significantly enhances LLM performance on consumer-grade hardware.

Read original β†—