RESEARCH30

Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning

arXiv CS.CL·June 5, 2026

This paper introduces a hybrid pre-training objective for text encoders, combining a JEPA-style latent-space prediction loss with a standard Masked Language Modelling (MLM) objective. This new approach aims to encourage representations anchored to deeper semantic structure rather than just surface-form token identity, showing significantly more uniform embeddings.

language models deep learning self-supervised learning machine learning Natural Language Processing

Read original ↗