RESEARCH27

PROWL: Prioritized Regret-Driven Optimization for World Model Learning

arXiv CS.LG·May 20, 2026

PROWL introduces a KL-constrained adversarial curriculum where a policy exposes high-error trajectories of a diffusion-based world model. This method improves model robustness by focusing on rare, interaction-critical transitions, converting failures into a stable, near-distribution training signal without drifting into out-of-distribution exploitation.

reinforcement learning model learning security World Models Adversarial Training

Read original ↗