GPU VRAM — AI articles, news & research

RESEARCH↑ trendingReddit r/LocalLLaMA·19d ago

110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp

The author achieved 110 tok/s with 12GB VRAM using ik_llama.cpp on the Qwen3.6 35B A3B model, noting a significant speed boost. This performance surpassed that of regular llama.cpp after its MTP PR merge.

GPU VRAM LLM optimization llama.cpp Benchmarking