RESEARCH30
Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning
arXiv CS.CLΒ·June 5, 2026
This paper introduces a hybrid pre-training objective for text encoders, combining a JEPA-style latent-space prediction loss with a standard Masked Language Modelling (MLM) objective. This new approach aims to encourage representations anchored to deeper semantic structure rather than just surface-form token identity, showing significantly more uniform embeddings.
Read original β