RESEARCH29
Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry
arXiv CS.LGΒ·May 21, 2026
Geometry-Lite is a novel prompt-level probe designed to interpret how safety evidence develops across layers in large language models. It analyzes layer-wise margin geometry using various readouts to understand boundary formation, improving safety detection over single-layer probes.
Read original β