← heapsort
RESEARCH29

Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry

arXiv CS.LGΒ·May 21, 2026

Geometry-Lite is a novel prompt-level probe designed to interpret how safety evidence develops across layers in large language models. It analyzes layer-wise margin geometry using various readouts to understand boundary formation, improving safety detection over single-layer probes.

Read original β†—