DOC↑ trending42

Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s

Reddit r/LocalLLaMA·April 15, 2026

The author shares a successful optimization for running the Qwen3.5-35B-A3B-UD-Q4_K_L model on an RTX 4060 Ti 16GB using llama.cpp, achieving 40-60 tokens/s with 64k context. The post provides the detailed `models.ini` configuration and server start command to replicate this performance.

Hardware Acceleration AI Model Optimization llama.cpp local inference Qwen

Read original ↗