RESEARCH60

Attack Selection in Agentic AI Control Evaluations Meaningfully Decreases Safety

arXiv CS.AI·June 8, 2026

This paper investigates "attack selection" in agentic AI settings, where attackers strategically choose when to start and stop attacks. The findings demonstrate that this capability significantly lowers measured empirical safety in AI control evaluations, even with limited audit budgets.

security AI control Agentic AI adversarial attacks AI safety

Read original ↗