RESEARCH27

Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games

arXiv CS.AI·May 7, 2026

Agent Island is a new multiagent simulation environment for language models, serving as a dynamic benchmark designed to mitigate saturation and contamination. Models like openai/gpt-5.5 are ranked based on their performance in games involving cooperation, conflict, and persuasion.

language models benchmarking multiagent games AI multiagent systems

Read original ↗