Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm more curious about the caching:

> (2) For all models, the input cache hit price has been reduced to 1/10 of the launch price. This price adjustment takes effect from 2026/4/26 12:15 UTC.

There is no end date. Currently, it's 2% of the input price for DeepSeek V4 Flash and 0.8% with this new V4 Pro pricing, which is extremely low compared to competitors to the point that it affects the unit economics a bit and I thought it would be temporary.

In the case of V4 Pro, the effective cost is ~$0.04/M input tokens given the caching (based on OpenRouter's metrics: https://openrouter.ai/deepseek/deepseek-v4-pro), which is significantly cheaper than even small models from competitors.



DeepSeek V4's KV cache is very efficient due to its heavily compressed and sparse attention architecture.

DeepSeek V3.2 which uses DSA only (sparse attention, but without compression from HCA and CSA) is a smaller model but uses 10x more memory at 1M context window compared to DS V4 Pro.

Also, I have to say, DeepSeek's API has a very good cache hit rate. With the same workload, I see ~80% KV cache hit rate with the DS API vs ~50% with the major western inference providers for open weight models.


Flash on it's own is not a very competitive model, it's pricing is within ranges of everything else on the market.

Probably the most direct competitor of Flash model :

GPT 5.4 mini

Cache Read $0.075 /M tokens

Gemini 3 flash :

Cache Read $0.05 /M tokens

e.g nothing very magical or ground breaking.


Cache read for dp4-flash is $0.0028 /M tokens, which is more than 10 times cheaper (and also much cheaper for cache miss and output tokens).

Have not actually compared it to other models, but I would not consider it in the same price range.


this price only available if you ok to send your data to Beijing Volcano Engine Technology Co. for the rest open router vendors it is not the same.


Not sure why you are downvoted, this is essentially correct (assuming Volcano Engine tech refers to Deepseek as provider).


A big point of DeepSeek V4 is the significantly reduced KV cache size.


Sonnet : Cache Read $0.30

Gemini 3.5 flash : Cache Read $0.15


For Sonnet, that's 10% of input cost (and requires paying for the cache)

For Gemini 3.5 Flash, it's also 10% of input cost.

Which is why 2%/0.8% change the economics in a meaningful way, given the input/cache-heavy way agents operate.


And their disk-based caching is amazing. I got a long 700k context session spanning more than a week, with pauses in between that was longer than a day, and some rewinds mixed in as well.

Stats from pi:

↑400k ↓438k R432M 71.9%/1.0M

Half a billion tokens, $2.12


Anthropic's caching requires you to pay a $0.75/Mtok for Sonnet and $1.25/MTok for Opus as a surcharge on top of the original input token cost. It's not even automatic.

If you are reading ~8 times (8 total back and forth tool calls) that means that cache reads in some sense cost ~$0.4 / M toks (Amortizing the write surcharge over all reads).

It's really quite ridiculously expensive considering what you are paying for is some residence on a VRAM that sometimes gets offloaded to NVMe.


GPT 5.4 Cache Read ≤272K $0.25

And it's multi modal, and available at whatever you might imagine rates limits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: