RESEARCH27

Do Value Vectors in Deep Layers Need Context from the Residual Stream?

arXiv CS.CL·June 3, 2026

Researchers found that language model performance can significantly improve when deeper layers learn context-free value vectors, preserving original token information. This eliminates the need to recompute or persistently cache these values, as the context-dependent component provides little additional benefit.

neural networks LLMs deep learning Attention Mechanism Transformers

Read original ↗