← heapsort
RESEARCH27

Understanding Annotator Safety Policy with Interpretability

arXiv CS.AIΒ·May 9, 2026

The paper introduces challenges in understanding annotator disagreement regarding AI safety policies, which can arise from operational failures, policy ambiguity, or value pluralism. It highlights the difficulty of discerning the root causes of these disagreements and the unreliability of self-reported reasoning from annotators.

Read original β†—