RESEARCH27

RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking

arXiv CS.AI·May 13, 2026

RankQ is an offline-to-online reinforcement learning objective designed to enhance sample efficiency by leveraging pre-collected datasets. It mitigates issues with inaccurate critics and limited data coverage by using a self-supervised multi-term ranking loss, which enforces structured action ordering and directs the Q-function towards higher-quality actions.

Offline-to-Online Learning Action Ranking reinforcement learning self-supervised learning Q-learning

Read original ↗