RESEARCHarXiv CS.CL·5d ago
POLARIS: Guiding Small Models to Write Long Stories
POLARIS is a new GRPO recipe that uses an LLM judge for rewards and human-reference injection to train small models. It significantly improves their ability to write long, high-quality stories, making a 9B model competitive with much larger frontier models.
28