RESEARCH27
Self-Play Reinforcement Learning under Imperfect Information in Big 2
arXiv CS.LGΒ·May 29, 2026
This study develops a self-play reinforcement learning framework for the imperfect-information card game Big 2. It demonstrates that PPO outperforms other value-approximating agents and benefits from entropy regularization and current-policy self-play.
Read original β