← heapsort-ai

performance

95 items

RESEARCH↑ trendingReddit r/LocalLLaMA·5/1/2026

nvidia/Gemma-4-26B-A4B-NVFP4

The content confirms the performance of the Gemma-4-26B-A4B-NVFP4 model on an NVIDIA 5090 GPU, detailing 18.8GB VRAM usage and 50k context capability. It also presents benchmark scores for the NVFP4 version compared to full precision across various metrics like GPQA, AIME, and MMLU Pro.

nvidia/Gemma-4-26B-A4B-NVFP4
42
NEWS↑ trendingReddit r/LocalLLaMA·5/4/2026

Llama.cpp MTP support now in beta!

Llama.cpp's MTP support is now in beta, initially supporting Qwen3.5 MTP, with potential for an imminent merge. This enhancement, alongside maturing tensor-parallel support, is expected to close performance gaps with vLLM, particularly in token generation speeds.

Llama.cpp MTP support now in beta!
42
ARTICLE↑ trendingReddit r/LocalLLaMA·4/22/2026

Is a high-end private local LLM setup worth it?

The user questions the worth of a high-end local LLM setup, citing high costs, setup difficulties, and perceived performance gaps compared to cloud services like Claude and GPT. They are willing to invest in powerful hardware but want to know if it can truly match the speed and intelligence of top commercial models.

41
RESEARCH↑ trendingReddit r/LocalLLaMA·4/19/2026

QWEN3.6 + ik_llama is fast af

A user reported running the Qwen3.6 + ik_llama model at over 50 tokens/second with a 200k context window on 16GB VRAM and 32GB RAM. This marks a significant performance benchmark for large language models.

QWEN3.6 + ik_llama is fast af
38