RESEARCH54

OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

arXiv CS.AI·June 9, 2026

OmniMem is a memory-efficient streaming framework for audio-visual LLMs, designed to overcome limitations of long-video inference due to increasing video tokens and KV caches. It employs modality-aware memory allocation and perturbation-aware memory selection to preserve informative KV states, enhancing compression and long-range understanding.

LLMs Audio-Visual AI deep learning Streaming Memory Compression

Read original ↗