RESEARCH28

POLARIS: Guiding Small Models to Write Long Stories

arXiv CS.CL·June 4, 2026

POLARIS is a new GRPO recipe that uses an LLM judge for rewards and human-reference injection to train small models. It significantly improves their ability to write long, high-quality stories, making a 9B model competitive with much larger frontier models.

story generation AI training machine learning creative writing LLM

Read original ↗