RESEARCH27

Evaluating Large Language Models in a Complex Hidden Role Game

arXiv CS.CL·May 25, 2026

This research quantifies the deceptive potential of Large Language Models (LLMs) in the social deduction game Secret Hitler, introducing novel metrics and an open-source framework. The study benchmarks LLMs against rule-based algorithms and human games, revealing a gap between conversational ability and strategic depth, and showing that reasoning-enhancement techniques can worsen performance for fascist roles.

Game AI Benchmarking deception large language models AI safety

Read original ↗