← heapsort
RESEARCH27

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

arXiv CS.CLΒ·May 8, 2026

This research tackles the growing threat of hidden malicious intent in multi-turn dialogues with large language models (LLMs), where attackers distribute their harmful objectives across multiple interactions. It proposes an early detection mechanism to identify the turn at which a response could enable harmful action, also introducing the Multi-Turn Intent Dataset (MTID) for training and evaluation.

Read original β†—