ARTICLETogether AI Blog·5/8/2026
Serving DeepSeek-V4: why million-token context is an inference systems problem
DeepSeek-V4 makes million-token context a significant inference systems problem. Together AI is exploring the inference work on NVIDIA HGX B200, focusing on solutions like compressed KV layouts and prefix caching for long-context workloads.
27