← heapsort
RESEARCH27

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

arXiv CS.CLΒ·May 27, 2026

This work introduces CroCo, a method for cross-lingual contrastive preference tuning on self-generated responses from LLMs, demonstrating effective transfer across 14 languages without language-specific preference annotations. An English-trained reward model yields useful rankings across most languages, improving existing models and preventing catastrophic forgetting, provided on-policy data is used.

Read original β†—