ARTICLE↑ trending45

Unweight: how we compressed an LLM 22% without sacrificing quality

Reddit r/LocalLLaMA·April 19, 2026

Cloudflare developed Unweight, a lossless compression system that shrinks LLM weights by 15-22% to overcome GPU memory bandwidth bottlenecks during inference. It achieves this by using Huffman coding to compress the predictable exponent bytes of BF16 weights, preserving bit-exact outputs.

GPU optimization lossless compression LLM compression Inference Optimization

Read original ↗