← heapsort-ai

GPU

46 items

DOC↑ trendingReddit r/LocalLLaMA·4/11/2026

Run Qwen3.5-397B-A13B with vLLM and 8xR9700

This document details the optimized execution of the Qwen3.5-397B-A17B-MXFP4 model using vLLM on RDNA4 GPUs, such as 8xR9700. It provides a Dockerfile with Triton patches and instructions for downloading the model and launching the inference container.

42
RESEARCH↑ trendingReddit r/LocalLLaMA·5/1/2026

nvidia/Gemma-4-26B-A4B-NVFP4

The content confirms the performance of the Gemma-4-26B-A4B-NVFP4 model on an NVIDIA 5090 GPU, detailing 18.8GB VRAM usage and 50k context capability. It also presents benchmark scores for the NVFP4 version compared to full precision across various metrics like GPQA, AIME, and MMLU Pro.

nvidia/Gemma-4-26B-A4B-NVFP4
42
RESEARCH↑ trendingReddit r/MachineLearning·5/3/2026

torch-nvenc-compress: GPU NVENC silicon as a PCIe bandwidth multiplier — PCA + pure-ctypes Video Codec SDK wrapper. Parallel-path overlap measured at 67% of theoretical max on a real GEMM + encode workload. [P]

This project introduces the Python library "torch-nvenc-compress," which leverages the GPU's NVENC/NVDEC hardware to compress LLM activations and KV cache, aiming to overcome PCIe bandwidth bottlenecks in multi-GPU setups. It measures a parallel-path overlap at 67% of theoretical max, improving communication between consumer GPUs.

torch-nvenc-compress: GPU NVENC silicon as a PCIe bandwidth multiplier — PCA + pure-ctypes Video Codec SDK wrapper. Parallel-path overlap measured at 67% of theoretical max on a real GEMM + encode workload. [P]
42
ARTICLEDEV.to AI·4/23/2026

I Built a Local AI VRAM Calculator & GPU Planner (Beta)

The author has launched a new beta tool called "Local AI VRAM Calculator & GPU Planner" to help determine GPU and VRAM requirements for running local LLMs. This tool aims to make hardware tradeoffs visible for different workloads and quantization levels before committing to components.

39
NEWS↑ trendingReddit r/LocalLLaMA·4/12/2026

Weekend project with Intel B70s

A user is building a high-end system with Intel Arc B70 GPUs and a Gigabyte B850 AI Top motherboard. The goal is to test the Gemma 4 model in legal RAG applications, utilizing a Hermes agent.

38