RESEARCH27
PROWL: Prioritized Regret-Driven Optimization for World Model Learning
arXiv CS.LGΒ·May 20, 2026
PROWL introduces a KL-constrained adversarial curriculum where a policy exposes high-error trajectories of a diffusion-based world model. This method improves model robustness by focusing on rare, interaction-critical transitions, converting failures into a stable, near-distribution training signal without drifting into out-of-distribution exploitation.
Read original β