NEWS↑ trending42

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Reddit r/LocalLLaMA·April 27, 2026

Luce DFlash introduces a GGUF port of DFlash speculative decoding for Qwen3.6-27B, achieving nearly 2x throughput on a single RTX 3090. This standalone C++/CUDA stack, available as an MIT-licensed open-source project, significantly enhances LLM performance on consumer-grade hardware.

Open Source Optimization performance Speculative Decoding LLM

Read original ↗