RESEARCH27

FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels

arXiv CS.LG·April 24, 2026

FairyFuse is a new inference system designed for CPU-only platforms, enabling multiplication-free execution of large language models. It uses ternary weights ({-1, 0, +1}) to replace floating-point multiplications with conditional additions and subtractions, significantly reducing memory bandwidth bottlenecks and offering up to 16x weight compression.

inference CPU optimization quantization performance LLM

Read original ↗