> (2) For all models, the input cache hit price has been reduced to 1/10 of the launch price. This price adjustment takes effect from 2026/4/26 12:15 UTC.
There is no end date. Currently, it's 2% of the input price for DeepSeek V4 Flash and 0.8% with this new V4 Pro pricing, which is extremely low compared to competitors to the point that it affects the unit economics a bit and I thought it would be temporary.
In the case of V4 Pro, the effective cost is ~$0.04/M input tokens given the caching (based on OpenRouter's metrics: https://openrouter.ai/deepseek/deepseek-v4-pro), which is significantly cheaper than even small models from competitors.
DeepSeek V4's KV cache is very efficient due to its heavily compressed and sparse attention architecture.
DeepSeek V3.2 which uses DSA only (sparse attention, but without compression from HCA and CSA) is a smaller model but uses 10x more memory at 1M context window compared to DS V4 Pro.
Also, I have to say, DeepSeek's API has a very good cache hit rate. With the same workload, I see ~80% KV cache hit rate with the DS API vs ~50% with the major western inference providers for open weight models.
And their disk-based caching is amazing. I got a long 700k context session spanning more than a week, with pauses in between that was longer than a day, and some rewinds mixed in as well.
Anthropic's caching requires you to pay a $0.75/Mtok for Sonnet and $1.25/MTok for Opus as a surcharge on top of the original input token cost. It's not even automatic.
If you are reading ~8 times (8 total back and forth tool calls) that means that cache reads in some sense cost ~$0.4 / M toks (Amortizing the write surcharge over all reads).
It's really quite ridiculously expensive considering what you are paying for is some residence on a VRAM that sometimes gets offloaded to NVMe.
> (2) For all models, the input cache hit price has been reduced to 1/10 of the launch price. This price adjustment takes effect from 2026/4/26 12:15 UTC.
There is no end date. Currently, it's 2% of the input price for DeepSeek V4 Flash and 0.8% with this new V4 Pro pricing, which is extremely low compared to competitors to the point that it affects the unit economics a bit and I thought it would be temporary.
In the case of V4 Pro, the effective cost is ~$0.04/M input tokens given the caching (based on OpenRouter's metrics: https://openrouter.ai/deepseek/deepseek-v4-pro), which is significantly cheaper than even small models from competitors.