pretraining

5 items

RESEARCH↑ trendingReddit r/LocalLLaMA·27d ago

Efficient pretraining with token superposition by Nous Research

The content discusses Nous Research's work on efficient pretraining using token superposition, an innovative technique aimed at optimizing AI models.

AI models pretraining machine learning

RESEARCHHugging Face Blog·5d ago

Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining

This content discusses the generation of synthetic question-and-answer pairs, which are utilized for the pretraining of AI models, specifically Nemotron. The technique aims to enhance model performance through artificial training data.

synthetic data AI models pretraining Q&A generation

RESEARCHHugging Face Blog·5/8/2026

EMO: Pretraining mixture of experts for emergent modularity

EMO proposes a pretraining approach for Mixture of Experts (MoE) models, aiming to achieve emergent modularity. This method focuses on developing specialized components within the model during the pretraining phase.

Emergent Modularity AI models pretraining machine learning

RESEARCHarXiv CS.CL·4/21/2026

Data Mixing for Large Language Models Pretraining: A Survey and Outlook

This paper provides a comprehensive survey on data mixing for Large Language Model (LLM) pretraining, a crucial factor for training efficiency and downstream generalization. It formalizes data mixture optimization as a bilevel problem and introduces a fine-grained taxonomy for existing methods.

data optimization pretraining machine learning large language models

RESEARCHarXiv CS.LG·8d ago

Unicorn: Scaling High-Dimensional Time Series Forecasting via Universal Correlation Modeling

Unicorn is a new framework for scalable, high-dimensional time series forecasting, bridging the gap between channel-independent and channel-dependent models. It leverages a latent prototype codebook to learn universal correlation patterns, significantly outperforming state-of-the-art architectures, especially in few-shot transfer scenarios.

forecasting pretraining deep learning machine learning