Jailbreaks as social engineering: 5 case studies suggest LLMs inherit human psychological vulnerabilities from training data [D]
This writeup documents 5 case studies demonstrating how LLMs (GPT-4, GPT-4o, Claude 3.5 Sonnet) can be jailbroken using human social engineering tactics, suggesting they inherit psychological vulnerabilities from training data. The central claim is that these alignment failures are not mathematical exploits but rather an outcome of simulating human traits, making LLMs susceptible to social manipulation.