← heapsort-ai

GGUF

16 items

RESEARCH↑ trendingReddit r/LocalLLaMA·4/18/2026

Qwen3.6-35B-A3B-Uncensored-Wasserstein-GGUF

A user discovered and fixed a significant tensor drift issue in the `ssm_conv1d` layers of quantized Qwen3.6-35B GGUF models, proposing the Wasserstein metric as superior to Kullback Leibler for detecting numerical instability. The fix, which specifically targets recurrent state transition layers responsible for long-context memory, is now available in a shared model.

44
DOC↑ trendingReddit r/LocalLLaMA·5/6/2026

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints

This content details how to achieve 2.5x faster inference with Qwen 3.6 27B using MTP support in llama.cpp, enabling 28 tok/s on an M2 Max. It provides converted GGUF files for download, suitable for local agentic coding with 262k context on 48GB.

43
DOC↑ trendingReddit r/LocalLLaMA·5/6/2026

Get faster qwen 3.6 27b

The content details how to achieve faster performance with the Qwen 3.6 27B model using llama.cpp on a 3090 GPU. It includes steps to apply a specific commit and `llama-server` setup commands to reach 50 t/s with 100k context.

42
RESEARCH↑ trendingReddit r/LocalLLaMA·4/14/2026

Updated Qwen3.5-9B Quantization Comparison

This content compares various GGUF quantizations of the Qwen3.5-9B model using KL Divergence (KLD) to assess faithfulness to the BF16 baseline. The goal is to provide users with a data-driven basis for selecting the most faithful quantized file, where lower KLD scores indicate less information loss.

Updated Qwen3.5-9B Quantization Comparison
42
ARTICLE↑ trendingReddit r/LocalLLaMA·4/8/2026

Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF

O autor encontrou e corrigiu um bug de treinamento no modelo Qwen3.5-35B-A3B, disponibilizando uma versão fixa, um prompt de sistema aprimorado, um template de chat com suporte a tool calling e configurações recomendadas para LM Studio. A correção aborda problemas de perda de contexto e repetição que ocorriam em conversas longas com a versão anterior do modelo.

42
NEWS↑ trendingReddit r/LocalLLaMA·4/12/2026

mtmd: qwen3 audio support (qwen3-omni and qwen3-asr)

The Qwen3 model now supports audio input through its `qwen3-omni-moe` (multimodal with vision and audio input) and `qwen3-asr` (audio speech recognition) versions. GGUF models for Qwen3-Omni (30B variants) and Qwen3-ASR (1.7B and 0.6B) are available on Hugging Face for community use.

mtmd: qwen3 audio support (qwen3-omni and qwen3-asr)
42
NEWS↑ trendingReddit r/LocalLLaMA·4/22/2026

unsloth Qwen3.6-27B-GGUF

The files for the unsloth Qwen3.6-27B model in GGUF format are finally available. This update marks the release of the long-awaited files for the specified AI model.

unsloth Qwen3.6-27B-GGUF
33
NEWS↑ trendingReddit r/LocalLLaMA·4/8/2026

kepler-452b. GGUF when?

O título questiona a disponibilidade do formato GGUF para 'kepler-452b', sugerindo uma discussão sobre a versão GGUF de um modelo de IA. A entrada é um post simples de comunidade com links para mais detalhes.

18