POLARIS: Guiding Small Models to Write Long Stories
POLARIS improves long-form creative writing in small models using LLM-as-a-judge rewards and human-reference injection.
Small models often struggle with coherence and length in creative writing. POLARIS addresses this by using a frontier LLM judge for structured quality feedback and injecting human-written stories as anchors during GRPO training. This approach helps smaller models maintain quality over longer outputs.