RESEARCH30
AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue
arXiv CS.CLΒ·May 26, 2026
This paper introduces AERIC, a novel transfer-oriented hidden-state approach for anticipatory same-pass monitoring of implicit harmful dialogue in language models. It aims to detect potential risks early enough to prevent the exposure of harmful continuations.
Read original β