Open-source single-GPU reproductions of Cartridges and STILL for neural KV-cache compaction [P]
The author implemented and open-sourced reproductions of two recent ideas, Cartridges and STILL, for neural KV-cache compaction and long-context inference. The goal is to make these research ideas easy to inspect and run with benchmark code, also comparing them against existing methods.