← heapsort-ai

Q-learning

2 items

RESEARCHarXiv CS.AI·27d ago

RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking

RankQ is an offline-to-online reinforcement learning objective designed to enhance sample efficiency by leveraging pre-collected datasets. It mitigates issues with inaccurate critics and limited data coverage by using a self-supervised multi-term ranking loss, which enforces structured action ordering and directs the Q-function towards higher-quality actions.

27