← heapsort
RESEARCH27

RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents

arXiv CS.CLΒ·May 27, 2026

RICE-PO is a novel critic-free policy optimization framework addressing the credit-assignment challenge in interactive language agents. It converts retrieval interactions into localized learning signals, evaluating executable actions and propagating credit to latent reasoning steps.

Read original β†—