ggml

4 items

ARTICLE↑ trendingReddit r/LocalLLaMA·18d ago

[llama.cpp] Asymmetric KV q8/q4 cache: current caveats and discussion in GGML repo

This content addresses a challenge in llama.cpp concerning asymmetric KV q8/q4 cache quantization, which can lead to CPU processing on CUDA. A GitHub discussion highlights a solution involving compiling with a specific KV cache quant combo, offering substantial memory savings with only a 1.3% precision loss.

llama.cpp GPU optimization quantization KV cache

NEWS↑ trendingReddit r/LocalLLaMA·4/9/2026

ggml: backend-agnostic tensor parallelism by JohannesGaessler · Pull Request #19378 · ggml-org/llama.cpp

A solicitação de pull request de JohannesGaessler sobre paralelismo de tensor agnóstico de backend para o projeto ggml-org/llama.cpp foi aprovada por Greganov. Este é um desenvolvimento importante para a eficiência e escalabilidade da inferência de modelos de IA.

llama.cpp tensor parallelism machine learning AI

DOCDEV.to AI·18d ago

在老旧 AMD RX 580 (8GB) 上通过原生 Vulkan 运行 Flux Schnell (12B) + LLM — 完整架构指南 [2026]

This technical guide demonstrates running LLMs and Stable Diffusion models on an old AMD RX 580 GPU in 2026, bypassing AI software limitations. It details the use of native Vulkan with the ggml engine for efficient inference, proving the viability of older hardware.

Vulkan hardware ggml AI inference

NEWSHugging Face Blog·2/20/2026

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

GGML e llama.cpp se uniram à Hugging Face para assegurar o progresso contínuo da inteligência artificial local. Esta colaboração visa fortalecer o desenvolvimento de soluções de IA acessíveis e eficientes.

Inferência de IA IA Local Hugging Face open-source AI