consumer hardware — AI articles, news & research

ARTICLE↑ trendingReddit r/LocalLLaMA·4/18/2026

RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part.

The content details how to optimize Qwen3.6-35B-A3B on consumer hardware (RTX 5070 Ti, Ryzen 9800X3D), achieving 79 t/s with 128K context. The key finding is the correct use of the `--n-cpu-moe N` flag in llama.cpp, which significantly outperforms the common `--cpu-moe` by utilizing more GPU VRAM for MoE experts.

llama.cpp AI optimization MoE LLM performance