RESEARCH27
Do Value Vectors in Deep Layers Need Context from the Residual Stream?
arXiv CS.CLΒ·June 3, 2026
Researchers found that language model performance can significantly improve when deeper layers learn context-free value vectors, preserving original token information. This eliminates the need to recompute or persistently cache these values, as the context-dependent component provides little additional benefit.
Read original β