RESEARCH27

How Language Models Process Negation

arXiv CS.CL·May 6, 2026

This study investigates how Large Language Models (LLMs) mechanistically process negation, revealing that even open-weight models possess internal components for correct negation processing despite often providing wrong answers. Their poor accuracy is attributed to late-layer attention promoting simple shortcuts, and models implement both attending to negated phrases and directly constructing negative phrase representations.

LLMs Mechanistic Interpretability attention mechanisms Natural Language Processing Negation

Read original ↗