RESEARCH27
Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction
arXiv CS.AIΒ·May 29, 2026
This paper introduces behavior-aware auxiliary corrections for off-policy temporal-difference prediction, aiming to stabilize TD learning with function approximation. It replaces the TDC auxiliary matrix with the behavior Bellman matrix to develop BA-TDC and BA-TDRC, providing a model for auxiliary-geometry design in neural-network value approximation.
Read original β