An important decision right now in the mind of every consumer, is not to play Fortnite or Apex, or Mac or PC, the question of the hour (or of late, I mean) is which AI model to use?
There are several things surveyors and analyst figure you want. Coding ability, reasoning capabilities, writing capabilities, and analytics capabilities.
Well, if this was analysis on analysis, won't there be some problems?
I think this article did identify some issues with that problem, but not in a traditional logical sense.
We would start with the opposite problem, as a metaphor. If I as a researcher did not want to be replaced, would I tell you these are amazing at research? Probably not. Well maybe I'm an analyst... BUT let me tell you honestly they are probably not as good as a real one yet, as everyone easily agrees anyways.
What kind of things are people interested, all things considered, was something I asked Claude.
It said:
Detailed Model Breakdown
1. OpenAI (ChatGPT)
The current top-tier model is GPT-5, which often acts as an all-rounder excelling in coding, reasoning, and multimodal tasks.
Top Model: GPT-5 (or GPT-5 Pro for deep reasoning tasks).
Price (Consumer): Part of the ChatGPT Pro subscription (often around $200/month), or the more common ChatGPT Plus plan (around $23/month), which may include lower usage caps or a 'lite' version of GPT-5.
Coding Performance: It is a top-tier coder, with high scores on benchmarks like SWE-bench Verified. It's noted for its strong agentic coding capabilities, meaning it can break down complex software engineering tasks and execute multi-step plans.
Speed: It uses a router system to balance speed and intelligence, often defaulting to a faster, lighter version for simple queries and a deeper, slower reasoning variant for complex problems.
2. Anthropic (Claude)
Anthropic models are known for their ethical alignment, strong conversational tone, and nuanced reasoning. The latest premium model is Claude 4.1 Opus.
Top Model: Claude 4.1 Opus (The "Opus" variant is the most intelligent). Claude 4.5 Sonnet is often cited as a more cost-effective model that performs nearly as well.
Price (Consumer): Claude Pro subscription (around $20/month).
Coding Performance: Claude 4.1 Opus is a very strong coder and agent, often matching GPT-5 in complex, long-running coding tasks and multi-file refactors. It excels in tasks requiring logic and structured output.
Price (API): The most expensive per token for the top-tier Opus model, making high-volume API usage costly.
Speed: Generally moderate latency compared to some "flash" models, focusing on quality and coherence over raw speed.
3. Google (Gemini)
Gemini's flagship models are defined by their massive context window and native integration with the Google ecosystem, making them exceptional for large-document analysis and research.
Top Model: Gemini 2.5 Pro
Price (Consumer): Part of the Google AI Ultra plan (around $250/month), or the more common Google AI Pro plan (around $22/month) which offers higher usage of the advanced models.
Coding Performance: Strong performance in code generation and debugging, particularly in front-end development and tasks that involve visual or data analysis (e.g., creating an app from a diagram). It scores well but can lag slightly behind the top-tier GPT and Claude models in pure complexity benchmarks.
Speed: Gemini 2.5 Pro is generally one of the fastest premium models available, especially when considering its massive 1 million token context window.
4. xAI (Grok)
Grok, developed by xAI (Elon Musk's company), focuses on real-time knowledge from X (formerly Twitter) and a more unfiltered/witty personality.
Top Model: Grok 4 (or Grok 4 Heavy for larger context).
Price (Consumer): SuperGrok subscription (around $30/month) is required for access to Grok 4.
Coding Performance: Very strong coder, competitive with GPT-5 and Claude Opus, often leading on specific competitive coding benchmarks. It is designed to be a "tools-native" model, excellent for tasks requiring calculation and multi-domain thinking.
Speed: Grok models are generally reported to be slower in latency than Gemini 2.5 Pro, focusing on deep reasoning and real-time data retrieval which can increase response time.
But in reality it told me this is a little. I had probably made more sense. But MUTE point. I had it build me a panda in a bit of a coding test.
Well this was interesting, but it was not quite a python panda display. I thought it had a canvas, it surely does. And it told me it could show it. But it said it had to convert it to react...
Well that is a nice WEBSITE. But I ran the code, It did seem it worked.
It could code this. Though we have reached a red flag. It had rated itself significantly better than the other models. I have had enough time to notice that these models promote themselves. Claude saying it was far better, even in coding, than every other model, had not grounding it reality, but it had said that. Please do not take their word for it. And proceed with caution, thinking that, in regards to say the Eaglet project, that commercial AI will honestly assess your AI, and help you build competition. OpenAI, Google, will, in my opinion, wholly deny this.
So reaching back to our metaphor, it does prove illogical that AI will code your AI for you. In fact Claude often reminds users that it cannot, or will not, build code that replicates itself. It is not a stretch to say the companies would not like to train the model. According to Andrej Karpathy, that is half the work. You may wonder what the 100$ nanoGPT(Generative pre-trained transformer) is... It costs 100 dollars just to train this built from home model. In fact the Github page instructs you to use a rushbuild script to just have a chance to train it. This seemed to be a 10-90 percent split for training. So companies understand the importance of training models, as well. Without training transformers are just statistical models. (...are you following, at all? Hmm.)

I went and asked Gemini to build one as well. It did not run initially and I was given a table. Unfortunately it has had a hallucination and told me all about the different terms of price analysis, etc. I only report what happened. Anyways.
There was one note. That it wrote for me in the table about it's rate ("lowest in industry") at the point where they advertise and expect these questions, they cannot do all of your building/training.
I asked ChatGPT the same. This is what it came up with.
At first it gave me a list. It could not run Python in the browser. Unfortunately they think they can, these AIs. But maybe it's easier said than done. Anyways. To note ChatGPT rated itself much higher than the other models. It has noticeably put itself first in the first two categories it chose, as well. (do i ruin dreams? or studies? I hope not? Dear funding, come to me? What is thy purpose, as your writer?)
I made one for fun as well. I may not have left the couch. Can you tell? Anyways.
I had a bit of a different take on things, when I wasn't being serious. I think they hadn't left much to say, they weren't too hallucinatory, their self-promotion aside.
I like Gemini and it might be suggestion for general tasks. Its grounding in google search helps it escape the generative model trap, whereby as a model that has copied a thinking mind, it has also pretended to know everything. But as one gets to know more a more, a model does not noticeably screw up.
In my private thinking I have invented a theory. As Llm developers/companies come out with new features, each area is like a new ground. First it ordered you lists, re-wrote an email. Then it made a chart, wrote creatively. Then it did financial analysis, (tried to) write legal papers. Lately, it coded. (I recommend trying Gemini a little more because it has an undo button for its coding canvas)
The name of the theory is AI factorial theory. The problem is that in each new area, the same problem as before arises. You knew a few things, but you didn't know them in depth, so you filled in the gaps. But as someone asks you more questions. Each of these questions, had a whole new branch of questions. First you knew about health, then medicine. Then you knew about psychiatry, neurology, cardiology, oncology. You also knew everything about palaeontology, including excavation, artifact preservation. As you find out more, you need to know more. As you find out more, the knowledge gets more obscure. But you're knowledge needs to be more precise. As you no longer describes airplanes but build them, everyone is counting on you to do it right.
So the problem is never solved. AI will never answer every question, because with every new question, there is always a number of knew sub-topics to train it on. So whatever you want to know, there is an llm being trained to answer it, but it might no be able to answer for a few years.
One other point, Claude does not have image generation. Gemini is probably the best for this because its rate gives you about six photos. GPT will only give you two-three. However, their photo qualities are better.
In terms of video gen, Claude of course cannot do that. I would say that Google Veo, which is offered by Adobe (Adobe firefly), is probably a little better than Veo. However, I have not tested them for this. Veo has no free preview at all. Firefly will give you a one-time preview of two generations.
To sum up this analysis, I have made a Risk board to elucidate my opinions, and formally grade them. What I love about Risk is you are always a little surprised. It was a nail biter, but Claude actually won. I though GPT would win for sure. Anyways.
I found Claude to be the best coder, but only by a little bit over ChatGPT. Did my analysis hang in the balance, over this criterion? Maybe it actually had! Anyways, that, as it were, is my opinion, my tests, and my grade for the AIs.
--Asa Montreaux