ARTICLE↑ trending42

KIV: 1M token context window on a RTX 4070 (12GB VRAM), no retraining, drop-in HuggingFace cache replacement - Works with any model that uses DynamicCache [P]

Reddit r/MachineLearning·April 12, 2026

KIV (K-Indexed V Materialization) is a middleware layer that replaces the standard HuggingFace KV cache with a tiered retrieval system, moving old data to system RAM. This enables 1M token context windows on an RTX 4070 (12GB VRAM) with only 12MB VRAM overhead and good performance.

KIV LLM optimization Context window VRAM KV cache

Read original ↗