← heapsort
ARTICLE27

Serving DeepSeek-V4: why million-token context is an inference systems problem

Together AI BlogΒ·May 8, 2026

DeepSeek-V4 makes million-token context a significant inference systems problem. Together AI is exploring the inference work on NVIDIA HGX B200, focusing on solutions like compressed KV layouts and prefix caching for long-context workloads.

Read original β†—