KIV: 1M token context window on a RTX 4070 (12GB VRAM), no retraining, drop-in HuggingFace cache replacement - Works with any model that uses DynamicCache [P]
KIV (K-Indexed V Materialization) is a middleware layer that replaces the standard HuggingFace KV cache with a tiered retrieval system, moving old data to system RAM. This enables 1M token context windows on an RTX 4070 (12GB VRAM) with only 12MB VRAM overhead and good performance.
