The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior
This research investigates the 'grokking' phenomenon in transformers, finding that the long delay to generalization in arithmetic models stems from a decoder bottleneck. The encoder acquires relevant structural knowledge early, but the decoder struggles to access it, a hypothesis supported by causal interventions like transplanting encoders.