RESEARCHarXiv CS.AI·20h ago
OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs
OmniMem is a memory-efficient streaming framework for audio-visual LLMs, designed to overcome limitations of long-video inference due to increasing video tokens and KV caches. It employs modality-aware memory allocation and perturbation-aware memory selection to preserve informative KV states, enhancing compression and long-range understanding.
54