Q-learning

2 items

RESEARCHarXiv CS.AI·28d ago

MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

MemQ integrates TD($\lambda$) eligibility traces with memory Q-values, propagating credit backward through a provenance DAG to account for memory dependencies. This approach significantly improves LLM agents' ability to accumulate and retrieve experience, achieving high success rates across various benchmarks.

Memory Systems LLMs machine learning Q-learning

RESEARCHarXiv CS.AI·27d ago

RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking

RankQ is an offline-to-online reinforcement learning objective designed to enhance sample efficiency by leveraging pre-collected datasets. It mitigates issues with inaccurate critics and limited data coverage by using a self-supervised multi-term ranking loss, which enforces structured action ordering and directs the Q-function towards higher-quality actions.

Offline-to-Online Learning Action Ranking reinforcement learning self-supervised learning