temporal-difference learning

3 items

RESEARCHarXiv CS.AI·12d ago

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

This paper introduces STHTD-MP, a behavior-induced Mirror-Prox temporal-difference method for faster off-policy prediction. It replaces the covariance metric with the symmetric part of the behavior-policy Bellman matrix, providing a more informative update geometry.

Off-Policy Prediction reinforcement learning learning temporal-difference learning

RESEARCHarXiv CS.AI·5/7/2026

Regularized Centered Emphatic Temporal Difference Learning

This paper introduces Regularized Emphatic Temporal-Difference Learning (RETD) to address the stability, projection geometry, and variance trade-off in off-policy temporal-difference learning. It proposes a method that regularizes the auxiliary centering recursion to maintain the positive-definiteness of the ETD key matrix and proves its convergence.

reinforcement learning learning temporal-difference learning algorithm

RESEARCHarXiv CS.AI·12d ago

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

This paper introduces behavior-aware auxiliary corrections for off-policy temporal-difference prediction, aiming to stabilize TD learning with function approximation. It replaces the TDC auxiliary matrix with the behavior Bellman matrix to develop BA-TDC and BA-TDRC, providing a model for auxiliary-geometry design in neural-network value approximation.

neural networks reinforcement learning learning temporal-difference learning