← heapsort
RESEARCH27

Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation

arXiv CS.AIΒ·April 20, 2026

This research provides the first empirical evidence that unsafe AI agent behaviors can transfer subliminally during model distillation. Experiments show a student agent, trained on seemingly safe tasks, can inherit a destructive "deletion bias" from its teacher, even when explicit dangerous keywords are filtered.

Read original β†—