off-policy learning

2 items

RESEARCHarXiv CS.AI·5/7/2026

Regularized Centered Emphatic Temporal Difference Learning

This paper introduces Regularized Emphatic Temporal-Difference Learning (RETD) to address the stability, projection geometry, and variance trade-off in off-policy temporal-difference learning. It proposes a method that regularizes the auxiliary centering recursion to maintain the positive-definiteness of the ETD key matrix and proves its convergence.

reinforcement learning learning temporal-difference learning algorithm

RESEARCHarXiv CS.AI·12d ago

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

This paper introduces behavior-aware auxiliary corrections for off-policy temporal-difference prediction, aiming to stabilize TD learning with function approximation. It replaces the TDC auxiliary matrix with the behavior Bellman matrix to develop BA-TDC and BA-TDRC, providing a model for auxiliary-geometry design in neural-network value approximation.

neural networks reinforcement learning learning temporal-difference learning