RESEARCHarXiv CS.LG·4/22/2026
Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training
Curiosity-Critic introduces an intrinsic reward for world model training, focusing on improving cumulative prediction error rather than just current transitions. It uses a learned critic to estimate an asymptotic error baseline, effectively separating epistemic from aleatoric error and directing exploration towards learnable transitions.
27