RESEARCH27
One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue
arXiv CS.CLΒ·May 8, 2026
This research tackles the growing threat of hidden malicious intent in multi-turn dialogues with large language models (LLMs), where attackers distribute their harmful objectives across multiple interactions. It proposes an early detection mechanism to identify the turn at which a response could enable harmful action, also introducing the Multi-Turn Intent Dataset (MTID) for training and evaluation.
Read original β