← heapsort
RESEARCH27

FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels

arXiv CS.LGΒ·April 24, 2026

FairyFuse is a new inference system designed for CPU-only platforms, enabling multiplication-free execution of large language models. It uses ternary weights ({-1, 0, +1}) to replace floating-point multiplications with conditional additions and subtractions, significantly reducing memory bandwidth bottlenecks and offering up to 16x weight compression.

Read original β†—