← heapsort
RESEARCH28

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

arXiv CS.AIΒ·May 29, 2026

This paper introduces STHTD-MP, a behavior-induced Mirror-Prox temporal-difference method for faster off-policy prediction. It replaces the covariance metric with the symmetric part of the behavior-policy Bellman matrix, providing a more informative update geometry.

Read original β†—