RESEARCH28
POLARIS: Guiding Small Models to Write Long Stories
arXiv CS.CLΒ·June 4, 2026
POLARIS is a new GRPO recipe that uses an LLM judge for rewards and human-reference injection to train small models. It significantly improves their ability to write long, high-quality stories, making a 9B model competitive with much larger frontier models.
Read original β