RESEARCH31

Compressed-Sensing-Guided, Inference-Aware Structured Reduction for Large Language Models

arXiv CS.CL·April 17, 2026

This paper proposes a unified compressed-sensing-guided framework for dynamic LLM execution, addressing the massive parameter counts, memory use, and decoding latency of large language models. It integrates model and prompt compression by using random measurement operators and sparse recovery to estimate task-conditioned and token-adaptive support sets.

Model Compression LLM optimization sparse recovery compressed sensing

Read original ↗