I don't think we need to compare models to Opus anymore. Opus users don't care a...

versteegen · 2026-04-24T05:17:30 1777007850

Which model's best depends on how you use it. There's a huge difference in behaviour between Claude and GPT and other models which makes some poor substitutes for others in certain use cases. I think the GPT models are a bad substitute for Claude ones for tasks such as pair-programming (where you want to see the CoT and have immediate responses) and writing code that you actually want to read and edit yourself, as opposed to just letting GPT run in the background to produce working code that you won't inspect. Yes, GPT 5.4 is cheap and brilliant but very black-box and often very slow IME. GPT-5.4 still seems to behave the same as 5.1, which includes problems like: doesn't show useful thoughts, can think for half an hour, says "Preparing the patch now" then thinks for another 20 min, gives no impression of what it's doing, reads microscopic parts of source files and misses context, will do anything to pass the tests including patching libraries...

ind-igo · 2026-04-24T05:12:42 1777007562

Agree with your assessment, I think after models reached around Opus 4.5 level, its been almost indistinguishable for most tasks. Intelligence has been commoditized, what's important now is the workflows, prompting, and context management. And that is unique to each model.

vidarh · 2026-04-24T06:56:04 1777013764

Same for me. There are tasks when I want the smartest model. But for a whole lot of tasks I now default to Sonnet, or go with cheaper models like GLM, Kimi, Qwen. DeepSeek hasn't been in the mix for a while because their previous model had started lagging, but will definitely test this one again.

The tricky part is that the "number of tokens to good result" does absolutely vary, and you need a decent harness to make it work without too much manual intervention, so figuring out which model is most cost-effective for which tasks is becoming increasingly hard, but several are cost-effective enough.

wuschel · 2026-04-24T06:29:52 1777012192

This is not true for some cases e.g. there are stark differences in the correctness of answers in certain type of case work.

spaceman_2020 · 2026-04-24T06:35:09 1777012509

I found Opus 4.7 to be actually worse than Opus 4.6 for my use case

Substantially worse at following instructions and overoptimized for maximizing token usage

sandos · 2026-04-24T06:04:32 1777010672

Is Opus nerfed somehow in Copilot? Ive tried it numerous times, it has never reallt woved me. They seem to have awfully small context windows, but still. Its mostly their reasoning which has been off

Codex is just so much better, or the genera GPT models.

specproc · 2026-04-24T07:00:48 1777014048

Opus just got killed in Copilot. I always found it great, FWIW.

https://github.blog/news-insights/company-news/changes-to-gi...

kmarc · 2026-04-24T05:08:49 1777007329

This resonates with me a lot.

I do some stuff with gemini flash and Aider, but mostly because I want to avoid locking myself into a walled garden of models, UIs and company

post-it · 2026-04-24T05:11:27 1777007487

What do you run these on? I've gotten comfortable with Claude but if folks are getting Opus performance for cheaper I'll switch.

slopinthebag · 2026-04-24T05:16:33 1777007793

Try Charm Crush first, it's a native binary. If it's unbearable, try opencode, just with the knowledge your system will probably be pwned soon since it's JS + NPM + vibe coding + some of the most insufferable devs in the industry behind that product.

If you're feeling frisky, Zed has a decent agent harness and a very good editor.

post-it · 2026-04-24T12:40:22 1777034422

I've downloaded Zed but haven't used it much, maybe this is my sign. Thanks!

oceanplexian · 2026-04-24T06:10:54 1777011054

You can just use Claude Code with a few env vars, most of these providers offer an Anthropic compatible API

avereveard · 2026-04-24T06:03:09 1777010589

eh idk. until yesterday opus was the one that got spatial reasoning right (had to do some head pose stuff, neither glm 5.1 nor codex 5.3 could "get" it) and codex 5.3 was my champion at making UX work.

So while I agree mixed model is the way to go, opus is still my workhorse.

gunalx · 2026-04-24T09:16:12 1777022172

I find gemini pretty good ob spatial reasoning.

avereveard · 2026-04-24T09:44:54 1777023894

Yeah but gemini has a hard time discussing about solutions it just jump to implementation which is great if it gets it right and not so great if it goes down the wrong path.

Not saying it is better or worse, but the way I perpersonally prefer is to design in chat, to make sure all unknown unknown are addressed

szundi · 2026-04-24T05:12:49 1777007569

I don’t know what people are doing but Minimax produced 16 bugreports which of 15 was false positives (literally a mistake).

In contrast ChatGPT 5.3 and also Opus has a 90% rate at least on this same project. (Embedded)

All other tests were the same. What are you doing with these models?

sandGorgon · 2026-04-24T05:52:35 1777009955

actually this is not the reason - the harness is significantly better. There is no comparable harness to Claude Code with skills, etc.

Opencode was getting there, but it seems the founders lost interest. Pi could be it, but its very focused on OpenClaw. Even Codex cli doesnt have all of it.

which harness works well with Deepseek v4 ?

darkwater · 2026-04-24T06:09:41 1777010981

What's the issue with OC? I tried it a bit over 2 months ago, when I was still on Claude API, and it actually liked more that CC (i.e. the right sidebar with the plan and a tendency at asking less "security" questions that CC). Why is it so bad nowadays?