← heapsort
ARTICLE↑ trending45

Unweight: how we compressed an LLM 22% without sacrificing quality

Reddit r/LocalLLaMAΒ·April 19, 2026
Unweight: how we compressed an LLM 22% without sacrificing quality

Cloudflare developed Unweight, a lossless compression system that shrinks LLM weights by 15-22% to overcome GPU memory bandwidth bottlenecks during inference. It achieves this by using Huffman coding to compress the predictable exponent bytes of BF16 weights, preserving bit-exact outputs.

Read original β†—