← heapsort
RESEARCH28

POLARIS: Guiding Small Models to Write Long Stories

arXiv CS.CLΒ·June 4, 2026

POLARIS is a new GRPO recipe that uses an LLM judge for rewards and human-reference injection to train small models. It significantly improves their ability to write long, high-quality stories, making a 9B model competitive with much larger frontier models.

Read original β†—