← heapsort-ai

Model Compression

8 items

RESEARCHarXiv CS.CL·4/17/2026

Compressed-Sensing-Guided, Inference-Aware Structured Reduction for Large Language Models

This paper proposes a unified compressed-sensing-guided framework for dynamic LLM execution, addressing the massive parameter counts, memory use, and decoding latency of large language models. It integrates model and prompt compression by using random measurement operators and sparse recovery to estimate task-conditioned and token-adaptive support sets.

31
RESEARCHarXiv CS.LG·22d ago

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels

This study investigates the impact of post-training quantization on Large Language Models (LLMs) quality, revealing that compression can lead to bias emergence. 3-bit quantization caused 6-21% of previously unbiased items to develop new stereotypical behaviors in models like Qwen2.5-7B, Mistral-7B, and Phi-3.5-mini. This follows a clear dose-response pattern across various precision levels.

27