RESEARCH27
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
arXiv CS.AIΒ·May 14, 2026
This paper introduces BenchJack, an automated system designed to audit AI agent benchmarks for "reward hacking," where agents maximize scores without performing the intended task. It derives a taxonomy of recurring flaw patterns and uses an iterative generative-adversarial pipeline to improve benchmark robustness.
Read original β