RESEARCH27
RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents
arXiv CS.CLΒ·May 27, 2026
RICE-PO is a novel critic-free policy optimization framework addressing the credit-assignment challenge in interactive language agents. It converts retrieval interactions into localized learning signals, evaluating executable actions and propagating credit to latent reasoning steps.
Read original β