RESEARCH27

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits

arXiv CS.CL·May 12, 2026

This paper measures the consistency and specificity of language model circuits using edge attribution patching across multiple tasks and models. It finds high within-task circuit reuse that is necessary for performance, but also significant overlap across tasks, indicating circuits are not task-specific.

language models Mechanistic Interpretability AI interpretability model circuits circuit analysis

Read original ↗