← heapsort
RESEARCH27

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

arXiv CS.LGΒ·April 22, 2026

Curiosity-Critic introduces an intrinsic reward for world model training, focusing on improving cumulative prediction error rather than just current transitions. It uses a learned critic to estimate an asymptotic error baseline, effectively separating epistemic from aleatoric error and directing exploration towards learnable transitions.

Read original β†—