← heapsort-ai

NVENC

1 items

RESEARCH↑ trendingReddit r/MachineLearning·5/3/2026

torch-nvenc-compress: GPU NVENC silicon as a PCIe bandwidth multiplier — PCA + pure-ctypes Video Codec SDK wrapper. Parallel-path overlap measured at 67% of theoretical max on a real GEMM + encode workload. [P]

This project introduces the Python library "torch-nvenc-compress," which leverages the GPU's NVENC/NVDEC hardware to compress LLM activations and KV cache, aiming to overcome PCIe bandwidth bottlenecks in multi-GPU setups. It measures a parallel-path overlap at 67% of theoretical max, improving communication between consumer GPUs.

torch-nvenc-compress: GPU NVENC silicon as a PCIe bandwidth multiplier — PCA + pure-ctypes Video Codec SDK wrapper. Parallel-path overlap measured at 67% of theoretical max on a real GEMM + encode workload. [P]
42