← heapsort
RESEARCH30

AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue

arXiv CS.CLΒ·May 26, 2026

This paper introduces AERIC, a novel transfer-oriented hidden-state approach for anticipatory same-pass monitoring of implicit harmful dialogue in language models. It aims to detect potential risks early enough to prevent the exposure of harmful continuations.

Read original β†—