self-play

5 items

RESEARCHarXiv CS.LG·4/8/2026

Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO

Este trabalho apresenta o ambiente Territory Paint Wars para investigar modos de falha do PPO em aprendizado por reforço multiagente competitivo. Ele identifica falhas de implementação que causam baixo desempenho e, após a correção, revela um novo problema de overfitting competitivo que prejudica a generalização.

failure modes reinforcement learning self-play PPO

RESEARCHarXiv CS.LG·22d ago

When Actions Disappear: Adversarial Action Removal in Self-Play Reinforcement Learning

This research investigates adversarial action masking in self-play reinforcement learning, where an attacker selectively removes legal actions from a victim's action set. The study found that learned masking causes significantly more damage than random masking or perturbation baselines, highlighting action availability as a critical robustness surface in self-play RL.

reinforcement learning security self-play adversarial attacks

RESEARCHarXiv CS.LG·12d ago

Self-Play Reinforcement Learning under Imperfect Information in Big 2

This study develops a self-play reinforcement learning framework for the imperfect-information card game Big 2. It demonstrates that PPO outperforms other value-approximating agents and benefits from entropy regularization and current-policy self-play.

reinforcement learning learning self-play imperfect-information-games

RESEARCHarXiv CS.AI·4/21/2026

Heterogeneous Self-Play for Realistic Highway Traffic Simulation

PHASE (Policy for Heterogeneous Agent Self-play on Expressway) is a context-aware self-play framework designed for realistic highway traffic simulation. It addresses the challenges of broad scenario coverage, controllable rare safety-critical situations, and credible multi-agent interactions, also supporting different vehicle profiles.

traffic management self-play Autonomous Vehicles AI

RESEARCHarXiv CS.CL·4/7/2026

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

A pesquisa aborda a queda de diversidade em sistemas de co-evolução de LLMs, onde um modelo gera problemas e outro os resolve, comprometendo o aprendizado de currículo autônomo. Para resolver isso, introduz o 'vocabulary dropout', uma máscara aleatória para manter a diversidade, resultando em melhorias no desempenho de solvers em raciocínio matemático.

mathematical reasoning diversity Co-evolution self-play