RESEARCH27

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

arXiv CS.AI·May 29, 2026

This paper introduces behavior-aware auxiliary corrections for off-policy temporal-difference prediction, aiming to stabilize TD learning with function approximation. It replaces the TDC auxiliary matrix with the behavior Bellman matrix to develop BA-TDC and BA-TDRC, providing a model for auxiliary-geometry design in neural-network value approximation.

neural networks reinforcement learning learning temporal-difference learning off-policy learning

Read original ↗