RESEARCH↑ trending42

Qwen 3.6 35B A3B, RTX 5090 32GB, 187t/s, Q5 K S, 120K Context Size, Thinking Mode Off, Temp 0.1

Reddit r/LocalLLaMA·April 16, 2026

The content details the performance of the Qwen 3.6 35B A3B model, achieving 187 tokens per second on an RTX 5090 32GB GPU. It highlights support for a 120K context size, using Q5 K S quantization and a temperature of 0.1.

inference AI hardware benchmark performance LLM

Read original ↗