RESEARCHarXiv CS.CL·14d ago
RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents
RICE-PO is a novel critic-free policy optimization framework addressing the credit-assignment challenge in interactive language agents. It converts retrieval interactions into localized learning signals, evaluating executable actions and propagating credit to latent reasoning steps.
27