RESEARCHarXiv CS.CL·4/17/2026
Compressed-Sensing-Guided, Inference-Aware Structured Reduction for Large Language Models
This paper proposes a unified compressed-sensing-guided framework for dynamic LLM execution, addressing the massive parameter counts, memory use, and decoding latency of large language models. It integrates model and prompt compression by using random measurement operators and sparse recovery to estimate task-conditioned and token-adaptive support sets.
31