RESEARCH27

Self-Play Reinforcement Learning under Imperfect Information in Big 2

arXiv CS.LG·May 29, 2026

This study develops a self-play reinforcement learning framework for the imperfect-information card game Big 2. It demonstrates that PPO outperforms other value-approximating agents and benefits from entropy regularization and current-policy self-play.

reinforcement learning learning self-play imperfect-information-games deep-rl

Read original ↗