RESEARCH27
Understanding Annotator Safety Policy with Interpretability
arXiv CS.AIΒ·May 9, 2026
The paper introduces challenges in understanding annotator disagreement regarding AI safety policies, which can arise from operational failures, policy ambiguity, or value pluralism. It highlights the difficulty of discerning the root causes of these disagreements and the unreliability of self-reported reasoning from annotators.
Read original β