RESEARCH27

Regularized Centered Emphatic Temporal Difference Learning

arXiv CS.AI·May 7, 2026

This paper introduces Regularized Emphatic Temporal-Difference Learning (RETD) to address the stability, projection geometry, and variance trade-off in off-policy temporal-difference learning. It proposes a method that regularizes the auxiliary centering recursion to maintain the positive-definiteness of the ETD key matrix and proves its convergence.

reinforcement learning learning temporal-difference learning algorithm off-policy learning

Read original ↗