← heapsort-ai

VRAM

10 items

DOC↑ trendingReddit r/LocalLLaMA·19d ago

Latest b9274 Addresses MTP VRAM leak

The b9274 update addresses a VRAM leak issue in MTP (Multi-Token Prediction) models, where GPU-allocated resources were not freed on sleep/resume cycles. The fix involves explicitly resetting speculative decoder, draft context, and draft model resources in the destroy() function to prevent out-of-memory errors.

47
ARTICLE↑ trendingReddit r/MachineLearning·4/12/2026

KIV: 1M token context window on a RTX 4070 (12GB VRAM), no retraining, drop-in HuggingFace cache replacement - Works with any model that uses DynamicCache [P]

KIV (K-Indexed V Materialization) is a middleware layer that replaces the standard HuggingFace KV cache with a tiered retrieval system, moving old data to system RAM. This enables 1M token context windows on an RTX 4070 (12GB VRAM) with only 12MB VRAM overhead and good performance.

42
ARTICLE↑ trendingReddit r/LocalLLaMA·4/9/2026

16 GB VRAM users, what model do we like best now?

Um usuário com 16 GB de VRAM compartilha sua experiência positiva com o modelo Qwen 3.5 27b em quants IQ3 em uma RTX 4080, alcançando boa velocidade e contexto. Ele discute os desafios de otimizar modelos de IA localmente com essa quantidade de VRAM, ponderando entre qualidade e velocidade ao lidar com diferentes níveis de quantização.

41
ARTICLEDEV.to AI·4/23/2026

I Built a Local AI VRAM Calculator & GPU Planner (Beta)

The author has launched a new beta tool called "Local AI VRAM Calculator & GPU Planner" to help determine GPU and VRAM requirements for running local LLMs. This tool aims to make hardware tradeoffs visible for different workloads and quantization levels before committing to components.

39
NEWS↑ trendingReddit r/LocalLLaMA·5/4/2026

Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM!

Leaks indicate that the AMD Ryzen AI Max+ PRO 495 (Gorgon Halo) might feature an APU with 192GB of VRAM, signaling a promising future for Local AI. Despite potential high costs due to the storage crisis, future versions like the Medusa Halo in 2027 are speculated to reach 256GB.

38
ARTICLEDEV.to AI·4/10/2026

i generated AI video on a GTX 1660. here's what it actually takes.

O artigo detalha o FramePack F1, uma ferramenta inovadora que permite gerar vídeos a partir de uma única imagem utilizando apenas 6 GB de VRAM, tornando-a acessível em GPUs comuns como a GTX 1660. Ele descreve a arquitetura de pipeline com cinco componentes, sublinhando a praticidade e a usabilidade local da solução para projetos reais.

23