RESEARCH60
Attack Selection in Agentic AI Control Evaluations Meaningfully Decreases Safety
arXiv CS.AIΒ·June 8, 2026
This paper investigates "attack selection" in agentic AI settings, where attackers strategically choose when to start and stop attacks. The findings demonstrate that this capability significantly lowers measured empirical safety in AI control evaluations, even with limited audit budgets.
Read original β