← heapsort-ai

Qwen

46 items

ARTICLE↑ trendingReddit r/LocalLLaMA·4/17/2026

Qwen 3.6 is the first local model that actually feels worth the effort for me

The author finds Qwen 3.6 to be the first local model genuinely worth the effort, unlike previous experiences with models that were either too weak or required excessive tweaking. Running on a 5090 + 4090 setup, the Q8 model provides 260k context and 170 tokens/second, proving effective for coding tasks like UI XML and embedded C++.

46
CASE↑ trendingReddit r/LocalLLaMA·4/17/2026

Qwen3.6 is incredible with OpenCode!

The user praises Qwen3.6 OpenCode as an "incredible" local model for complex coding tasks, highlighting its effectiveness in implementing RLS across a multi-language codebase. While not perfect, its ability to iterate on compiler errors makes it a viable alternative to models like Claude Code for daily use.

44
CASE↑ trendingReddit r/LocalLLaMA·4/17/2026

Qwen3.6. This is it.

A user recounts their experience with the Qwen3.6 model, which successfully built and tested a tower defense game, demonstrating the ability to identify and fix its own bugs. The AI confirmed builds using screenshots, astonishing the user with its advanced capabilities.

Qwen3.6. This is it.
43
DOC↑ trendingReddit r/LocalLLaMA·5/6/2026

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints

This content details how to achieve 2.5x faster inference with Qwen 3.6 27B using MTP support in llama.cpp, enabling 28 tok/s on an M2 Max. It provides converted GGUF files for download, suitable for local agentic coding with 262k context on 48GB.

43
DOC↑ trendingReddit r/LocalLLaMA·4/11/2026

Run Qwen3.5-397B-A13B with vLLM and 8xR9700

This document details the optimized execution of the Qwen3.5-397B-A17B-MXFP4 model using vLLM on RDNA4 GPUs, such as 8xR9700. It provides a Dockerfile with Triton patches and instructions for downloading the model and launching the inference container.

42
DOC↑ trendingReddit r/LocalLLaMA·5/6/2026

Get faster qwen 3.6 27b

The content details how to achieve faster performance with the Qwen 3.6 27B model using llama.cpp on a 3090 GPU. It includes steps to apply a specific commit and `llama-server` setup commands to reach 50 t/s with 100k context.

42
ARTICLE↑ trendingReddit r/LocalLLaMA·4/17/2026

Qwen 3.6 35B crushes Gemma 4 26B on my tests

The author conducted a personal benchmark where Qwen 3.6 35B significantly outperformed Gemma 4 26B across tests evaluating agentic capabilities, coding, image-to-text synthesis, instruction following, and reasoning. Qwen fixed more issues, showed fewer regressions, and completed the tasks in less time, indicating superior overall performance.

42
DOC↑ trendingReddit r/LocalLLaMA·27d ago

llama.cpp docker images to run MTP models

This content describes the creation of Docker images for `llama.cpp` to simplify running MTP models, following numerous improvements and bug fixes. It also notes that Unsloth has released new MTP models for Qwen 3.6, making previous versions obsolete.

41