← heapsort
RESEARCH27

RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking

arXiv CS.AIΒ·May 13, 2026

RankQ is an offline-to-online reinforcement learning objective designed to enhance sample efficiency by leveraging pre-collected datasets. It mitigates issues with inaccurate critics and limited data coverage by using a self-supervised multi-term ranking loss, which enforces structured action ordering and directs the Q-function towards higher-quality actions.

Read original β†—