← heapsort
RESEARCH27

Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

arXiv CS.CLΒ·May 18, 2026

This research introduces OP-Mix, a novel algorithm for efficient data mixing throughout the entire lifecycle of language model training. It addresses the challenge of combining diverse data sources for pretraining, continual learning, and adaptation, proposing a unified online decision-making solution.

Read original β†—