RESEARCH27

Parameter Efficiency Is Not Memory Efficiency: Rethinking Fine-Tuning for On-Device LLM Adaptation

arXiv CS.LG·April 28, 2026

This research challenges the assumption that Parameter-Efficient Fine-Tuning (PEFT) equates to memory efficiency for on-device LLMs, showing existing methods can still lead to out-of-memory errors. It introduces LARS (Low-memory Activation-Rank Subspace), a novel framework that decouples memory consumption from sequence length by constraining the activation subspace, achieving an average 33.54% memory footprint reduction.

Memory Optimization on-device AI Fine-tuning PEFT LLM

Read original ↗