RESEARCHarXiv CS.AI·27d ago
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
This paper introduces BenchJack, an automated system designed to audit AI agent benchmarks for "reward hacking," where agents maximize scores without performing the intended task. It derives a taxonomy of recurring flaw patterns and uses an iterative generative-adversarial pipeline to improve benchmark robustness.
27