RESEARCHarXiv CS.LG·12d ago
Self-Play Reinforcement Learning under Imperfect Information in Big 2
This study develops a self-play reinforcement learning framework for the imperfect-information card game Big 2. It demonstrates that PPO outperforms other value-approximating agents and benefits from entropy regularization and current-policy self-play.
27