A Structural Threshold in Decision Capacity Governs Collapse in Self-Play Reinforcement Learning
This paper shows that a threshold in decision capacity governs collapse in self-play reinforcement learning agents under asymmetric rule perturbations. Eliminating all positive-reach contingent decisions causes rapid convergence to a deterministic exploitation attractor, while preserving even a single such decision prevents this collapse.