← heapsort-ai

grokking

4 items

RESEARCHarXiv CS.LG·4/16/2026

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

This research investigates the 'grokking' phenomenon in transformers, finding that the long delay to generalization in arithmetic models stems from a decoder bottleneck. The encoder acquires relevant structural knowledge early, but the decoder struggles to access it, a hypothesis supported by causal interventions like transplanting encoders.

27
RESEARCHarXiv CS.LG·28d ago

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

This empirical study investigates Tian's (2025) feature repulsion theorem in two-layer network grokking, testing its mechanisms and spectral signatures. It observes a clear structure-mechanism dissociation, with the predicted sign rule robustly holding for similar feature pairs despite a strong activation dependence in the spectral signature.

27