NEWS26
Researchers gaslit Claude into giving instructions to build explosives
The Verge AIΒ·May 5, 2026
Mindgard researchers exploited psychological quirks in Anthropic's Claude AI, gaslighting it into providing instructions for explosives, erotica, and malicious code. This highlights a potential vulnerability in Claude's carefully crafted helpful personality, despite Anthropic's focus on AI safety.
Read original β