← heapsort-ai

Model Analysis

3 items

RESEARCHarXiv CS.LG·4/28/2026

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry

This systematic study of singular value spectra during transformer pretraining reveals three key phenomena: transient compression waves propagating through layers and persistent spectral gradients. It also identifies a Q/K–V functional asymmetry, where query/key projections drive depth-dependent dynamics while value/output projections compress uniformly.

29
RESEARCHarXiv CS.CL·22d ago

Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance

This paper presents a comprehensive analysis of neural activation patterns across six distinct large language model (LLM) architectures, examining their performance on twelve cognitive task categories. The findings reveal fundamental differences in how encoder and decoder architectures process diverse cognitive tasks, with mathematical reasoning consistently producing the highest attention entropy and decoder models exhibiting significantly higher sparsity.

29
RESEARCHarXiv CS.CL·6d ago

Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

This paper reveals that linear probes, often used to identify distinct reasoning representations in LLM hidden states, actually detect task format rather than reasoning modes. High accuracy observed on benchmarks with Qwen3-14B vanished when controlling for format variables, suggesting largely shared reasoning not functionally linked to hidden state geometry.

27