An Empirical Study of Multi-Generation Sampling for Jailbreak Detection in Large Language Models
This empirical study investigates jailbreak detection in large language models, showing that single output evaluation systematically underestimates vulnerability. Increasing the number of sampled generations, especially from one to moderate sampling, significantly improves the detection of harmful behavior.

