RESEARCH27
CoFi-PGMA: Counterfactual Policy Gradients under Filtered Feedback for Multi-Agent LLMs
arXiv CS.LGΒ·April 28, 2026
CoFi-PGMA is a new framework for optimizing learning in multi-agent LLM systems, addressing filtered feedback in both routing and collaborative scenarios. It introduces a counterfactual per-agent training objective based on marginal contribution to correct the learning signal.
Read original β