← heapsort
RESEARCH54

OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

arXiv CS.AIΒ·June 9, 2026

OmniMem is a memory-efficient streaming framework for audio-visual LLMs, designed to overcome limitations of long-video inference due to increasing video tokens and KV caches. It employs modality-aware memory allocation and perturbation-aware memory selection to preserve informative KV states, enhancing compression and long-range understanding.

Read original β†—