RESEARCH27

jina-embeddings-v5-omni: Geometry-preserving Embeddings via Locked Aligned Towers

arXiv CS.CL·May 12, 2026

This work introduces GELATO, a novel approach to multimodal embedding models that extends VLM-style architectures. It results in the jina-embeddings-v5-omni suite, which efficiently encodes text, image, audio, and video into a single semantic embedding space by freezing backbone text models and training only connecting components.

embedding models multimodal AI deep learning machine learning AI research

Read original ↗