RESEARCH31
Compressed-Sensing-Guided, Inference-Aware Structured Reduction for Large Language Models
arXiv CS.CLΒ·April 17, 2026
This paper proposes a unified compressed-sensing-guided framework for dynamic LLM execution, addressing the massive parameter counts, memory use, and decoding latency of large language models. It integrates model and prompt compression by using random measurement operators and sparse recovery to estimate task-conditioned and token-adaptive support sets.
Read original β