Sleep Phase Cuts Transformer Costs by Consolidating Memory
A new research paper introduces a "sleep phase" for language models, consolidating context into fixed-size memory layers. This method significantly reduces quadratic inference costs and enhances performance on long-horizon tasks.