RESEARCH27
CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations
arXiv CS.CLΒ·May 27, 2026
This work introduces CroCo, a method for cross-lingual contrastive preference tuning on self-generated responses from LLMs, demonstrating effective transfer across 14 languages without language-specific preference annotations. An English-trained reward model yields useful rankings across most languages, improving existing models and preventing catastrophic forgetting, provided on-policy data is used.
Read original β