← heapsort
RESEARCH27

PROWL: Prioritized Regret-Driven Optimization for World Model Learning

arXiv CS.LGΒ·May 20, 2026

PROWL introduces a KL-constrained adversarial curriculum where a policy exposes high-error trajectories of a diffusion-based world model. This method improves model robustness by focusing on rare, interaction-critical transitions, converting failures into a stable, near-distribution training signal without drifting into out-of-distribution exploitation.

Read original β†—