Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning
This research introduces PolyFact, a multilingual factual QA dataset, to address cross-lingual factual inconsistency in LLMs. It finds that reinforcement learning via GRPO consistently improves cross-lingual factual recall and generalization compared to supervised fine-tuning.