ARTICLE↑ trendingReddit r/LocalLLaMA·4/19/2026
Unweight: how we compressed an LLM 22% without sacrificing quality
Cloudflare developed Unweight, a lossless compression system that shrinks LLM weights by 15-22% to overcome GPU memory bandwidth bottlenecks during inference. It achieves this by using Huffman coding to compress the predictable exponent bytes of BF16 weights, preserving bit-exact outputs.

45