Learning Library

← Back to Library

Claude's Latest Model Beats GPT5

15m • Unknown Channel • ai-ml • review • intermediate • Watch on YouTube ↗

Key Points

The reviewer tested the new Claude model across code, PowerPoint decks, spreadsheets, and docs, benchmarking it against OpenAI’s ChatGPT‑5 and Anthropic’s own Opus 4.1, and found a noticeably larger performance jump.
Unlike OpenAI’s consumer‑focused approach, Anthropic is positioning Claude as a “professional AI” that directly boosts workplace productivity, and the new model’s capabilities reinforce that strategy.
The model outperforms Opus 4.1 in creating truly usable deliverables—such as detailed slide decks and Amazon‑style PRFAQs—meeting a long‑standing bar that many AI systems previously missed.
It excels at surfacing exactly where human expertise should intervene, making the collaboration between AI and domain experts clearer and more effective than with prior models.
While still imperfect, the reviewer sees this release as a significant step forward for AI tools aimed at helping professionals get concrete work done rather than just generating generic content.

Sections

Full Transcript

# Claude's Latest Model Beats GPT5 **Source:** [https://www.youtube.com/watch?v=p-ibfrMN0M8](https://www.youtube.com/watch?v=p-ibfrMN0M8) **Duration:** 00:15:41 ## Summary - The reviewer tested the new Claude model across code, PowerPoint decks, spreadsheets, and docs, benchmarking it against OpenAI’s ChatGPT‑5 and Anthropic’s own Opus 4.1, and found a noticeably larger performance jump. - Unlike OpenAI’s consumer‑focused approach, Anthropic is positioning Claude as a “professional AI” that directly boosts workplace productivity, and the new model’s capabilities reinforce that strategy. - The model outperforms Opus 4.1 in creating truly usable deliverables—such as detailed slide decks and Amazon‑style PRFAQs—meeting a long‑standing bar that many AI systems previously missed. - It excels at surfacing exactly where human expertise should intervene, making the collaboration between AI and domain experts clearer and more effective than with prior models. - While still imperfect, the reviewer sees this release as a significant step forward for AI tools aimed at helping professionals get concrete work done rather than just generating generic content. ## Sections - [00:00:00](https://www.youtube.com/watch?v=p-ibfrMN0M8&t=0s) **Untitled Section** - - [00:03:25](https://www.youtube.com/watch?v=p-ibfrMN0M8&t=205s) **Model Self-Checking and Tool Transparency** - The speaker explains that the new Claude model constantly narrates its reasoning, validates its work (e.g., slide design, code execution), and autonomously corrects errors, showcasing a higher level of self‑verification compared to earlier AI models. - [00:06:48](https://www.youtube.com/watch?v=p-ibfrMN0M8&t=408s) **Rapid Iterative AI Writing Workflow** - The speaker explains how the new Claude model lets them quickly produce and refine clear, high‑quality narrative content in minutes, turning AI into a smart collaborator that multiplies productivity without sacrificing human control. - [00:12:04](https://www.youtube.com/watch?v=p-ibfrMN0M8&t=724s) **AI Model as Thoughtful Colleague** - The speaker lauds the new Claude model for its clear, self‑checking reasoning and willingness to push back on errors, fostering a balanced, professional partnership instead of a frantic, uncontrollable tool. - [00:15:39](https://www.youtube.com/watch?v=p-ibfrMN0M8&t=939s) **Expressing Excitement and Request** - The speaker conveys enthusiasm about something and asks to be informed. ## Full Transcript

0:00So, over the past few days, I was lucky 0:02enough to get early access to a new 0:04Clawude model that is releasing today. I 0:07want to give you why you should care, 0:10what you should expect versus the other 0:12models out there, and where it really is 0:14going to make a difference. So, stick 0:16with me for the next 10 or 15 minutes, 0:18and we're going to get through what to 0:19make of this model, and you'll be able 0:21to figure out whether it's useful for 0:23you. Number one, what were my first 0:26impressions and top takeaways? I tested 0:28this model inside clawed code. I tested 0:31this model creating PowerPoint decks. I 0:33tested this model creating spreadsheets. 0:35I tested it creating docs. I tested its 0:37thinking. I really put it through its 0:38paces and I benchmarked it against chat 0:41GPT5 which is of course OpenAI's 0:44frontier model and also against Opus 4.1 0:47which is the current frontier model from 0:49Claude before today. And I wanted to 0:52know what was going to stand out to me. 0:54I spend hundreds and hundreds of hours 0:55in AI models. I'm very familiar with 0:57sort of the look and feel differences 0:59and I wanted to get hands-on early to 1:02see if I could tell a difference. 1:04Spoiler alert, it was a big difference. 1:06And I'm not saying that because I want 1:07to hype the model. No model is perfect. 1:10But I think that this model moves the 1:11ball forward in some really important 1:13ways for people who care about getting 1:15work done. And frankly, that's actually 1:18in line with Anthropic's larger 1:20strategy. If you look at the two big 1:22players, OpenAI and Enthropic, OpenAI 1:25continues to lean very consumer. 1:26Enthropic is adopting a specialized 1:28stance of leaning into professional AI. 1:33What does it mean to have professionals 1:35work with AI by choice and pick 1:37anthropic on purpose to get their work 1:39done? How does anthropic help them move 1:42their work deliverables forward? The 1:44signatures for that strategy were all 1:46over this model. I did a very popular 1:48guide a few weeks ago talking about Opus 1:514.1 when it released and emphasizing it 1:54was the first model that actually got as 1:56far as creating really usable 1:58spreadsheets, really usable PowerPoints, 2:00which had been a really, really tough 2:02bar to meet for AI previously. Well, 2:05this new model beats that. This new 2:08model beats Opus 4.1. And I and I put 2:10them head-to-head and I did not give 2:12them an easy assignment. They had tough 2:14assignments. They had to make tough, you 2:15know, 11 or 12 slide SAS decks. They had 2:18to make docs in an Amazon PRFAQ style. I 2:23really put them through their paces. And 2:25what stood out to me as a human 2:27observer, as someone who wants these 2:29tools to work with us in the workplace, 2:31is that this new model is what enables 2:34me to see clearly where I need to 2:37intervene. We talk a lot about AI 2:39automating AI picking up work from us. 2:41But I've been thinking a lot about this 2:43idea that the most valuable AI is the AI 2:47that helps you to see clearly when you 2:50as a good human in your domain with deep 2:52experience needs to touch the work. in 2:56this model more than chat GPT5, more 2:59than Opus 4.1, it's clear enough in its 3:02narrative that you can see really 3:05clearly what it's trying to go for and 3:07you can see really clearly where you 3:08need to touch the work to make it 3:10better. And so if you think about it 3:12within the context of a larger say deck 3:14preparation workflow, a spreadsheet 3:16model preparation workflow, this model 3:18is going to speed the time it takes to 3:21get these important pieces of work done. 3:23And it does that in a number of useful 3:25ways. And I want to call out sort of the 3:26gritty hands-on notes that I have so 3:29that you can start to think about it. 3:30One of the first takeaways as you work 3:32with this model, it's getting to that 3:35level of quality, that level of clarity 3:37on narrative by checking its work a lot 3:39more than previous models did. One of 3:41the hallmarks of the current claude 3:44style is that you have this running 3:46commentary from the model that shows you 3:49what tools it's invoking and what it's 3:51thinking about at the moment. It's sort 3:52of an express chain of thought. This 3:54model is expressing an obsession, I 3:58think that's the right word, an 3:59obsession with checking its work and 4:02fixing it. Multiple times when it was 4:04creating PowerPoint decks, I saw it 4:06measure the pixel overlap between title 4:09text and a particular visual element, 4:12correct itself, and say, "That's not 4:13right," and redo the slide. It didn't 4:16come to me and make me do that. It 4:17caught it itself. That's a big deal. It 4:20also took the time to check the formulas 4:22and spreadsheets when it was showing me 4:25a code project I was working on. It was 4:28actually going through the next.js 4:29framework and it was validating that it 4:31could start and run the dev server 4:32before coming back and telling me it 4:34could. I got to say chat GPT5 just likes 4:37to say it could do stuff, right? Like 4:38there's sort just a sort of a commitment 4:40to talk that chat GPT5 has. I'm not here 4:43to tell you which model to pick. This 4:45video should not be interpreted as me 4:47saying pick only this model to work. We 4:50live in a multimodel world. I want you 4:52to get a sense of where this model's 4:54really useful and I think it's right in 4:56line with where anthropic is going. This 4:58model is going to be useful in 5:00dramatically cutting down the grunge 5:03time that we have spent on work where 5:06you are just waiting through a lot of 5:08messy inputs where you are trying to 5:11figure out how you can understand a 5:13complicated spreadsheet where you are 5:15trying to write a draft and you just 5:17feel like your head is mush and you 5:19don't know how to get the words on the 5:21page but they need to be really clear. 5:22They can't just be any old AI slot. 5:25That's where this model's going to 5:26excel. I'll give you an example. I fed 5:28this model 66 pages of PDF voice of 5:32customer insight. So, it was all like 5:34quotes, right? Things that were out of 5:36order, not organized in any way. I just 5:38wanted to see like what it would do with 5:40raw customer utterance. And you know 5:42what it did? It was able to analyze it. 5:45And then this model in particular was 5:48able to extract meaningful narrative 5:50from it. And I think that's really 5:52important to reflect on because those 5:55kinds of insights don't make themselves 5:57happen. I used to run voice of customer 6:00when I was at Amazon. It was really, 6:02really hard to manually go through a 6:05bunch of customer utterances and they 6:06just start to meld together in your 6:08brain. It's hard to get narrative. It's 6:10hard to attach a quote to a particular 6:11insight. This is the first model I've 6:14seen that can in one shot go from 6:18a big muddle of customer quotes to an 6:21executive ready narrative arc in a 6:24PowerPoint presentation. Now, is it the 6:26most beautiful PowerPoint I've ever 6:27seen? No. Is it better even than the 4.1 6:31that I thought was usable? It actually 6:33is. This is the first PowerPoint 6:36presentation 6:38AI creation tool that has made something 6:41that is so close to ready that I would 6:44call it 90% ready to go out of the gate. 6:47A little bit of polish here and there, 6:48but that's really it. And what's handy 6:50about that is it does it in just a few 6:53minutes, which gives you a chance to do 6:55multiple iterations. Remember when I 6:57said earlier in this video that part of 6:59why I'm excited about this model is it 7:02puts us humans back in touch with the 7:04work. That clarity of narrative is what 7:06I have needed to wade through AI slop 7:09and actually find something useful. And 7:11I saw it come through not just in decks 7:12but in the clarity of presentation and 7:14spreadsheets in the clarity of working 7:16with it in cloud code. It felt like 7:18working with a good thinking partner. We 7:19were able to quickly establish a file 7:21structure to work together. It was just 7:23a dream. and in the clarity of dockw 7:25writing like it was like clear narrative 7:27and didn't feel like I had to wade 7:29through Aos thought. And so if I if I 7:31think about that and I think about the 7:33minutes it takes to make this I realize 7:35as a human who cares about good work and 7:38doing it well I have multiplied my time. 7:41And it's not that I've multiplied my 7:42time to put out more 90% good artifacts. 7:45I have given myself a shot at doing two 7:48or three of these and having progressive 7:50inputs as I look at the narrative and I 7:52shape it and I think about whether 7:54that's what I want to say and it's 7:55relatively trivial in 30 minutes or 40 7:58minutes to come out with exactly what I 8:00want because each iteration now takes 8:02five or 6 minutes to make with this new 8:04claude model. It's really easy. And if 8:06you're wondering how prompt sensitive 8:09the model is, this one's really 8:11interesting. haven't seen this in any 8:13other model and I would be curious for 8:15your take as you play with it. When I 8:17played with it, I found that it was 8:20surprisingly useful regardless of the 8:23prompt structure I applied. And so I 8:26applied a super formal prompt structure 8:29and I also applied a very casual prompt 8:31structure which was just two or three 8:33lines plus a bunch of data. In both 8:35cases I got a very usable output was 8:38healthy. It was happy. It was the kind 8:40of PowerPoint you want to show around 8:42the office. It was great. It was not a 8:44problem. And that was also the case with 8:46spreadsheets. It was also the case with 8:47docs. And if that holds up, if you're 8:50seeing that as well, what that suggests 8:53is that Anthropic is doing enough 8:55reinforcement learning on Office 8:57Primitives like Docs, like Dex, like 8:59PowerPoint that it's figuring out what 9:03we want from shorter and smaller and 9:05more casual utterances, which is a 9:07really big deal because one of the 9:09things that has made people really 9:11frustrated with chat GPT5 is that it is 9:14sensitive to prompting. I don't think 9:16it's an accident that that the chat GPT 9:19team has had to release prompt packs 9:22aimed at chat GPT5. You know who hasn't 9:25had to do that? Anthropic. They haven't 9:27had to do it because the model does a 9:30better job of understanding the kind of 9:32work that you want and just going for 9:34it. And this gets at one of the larger 9:36takeaways that I think is really 9:37interesting. Enthropic is betting on our 9:40future for the next few years at least 9:43being somewhat similar to what we have 9:45today. Despite all the big hype and all 9:46the big takeaways, they're investing in 9:48a world where we will still need 9:50PowerPoints, where we will still need 9:51spreadsheets, where we will still need 9:54the ability to run claude code as a 9:56human and get something that boots on a 9:58dev server. And what they're betting is 10:00that what we need is clearer and more 10:03professional outputs that we can 10:05understand more easily. And that in turn 10:08will mean that we take less time on the 10:10grunge of our work. Because to be 10:13honest, no one wants to trade the grunge 10:15of the old way of doing things pre202 10:18where we were just doing everything by 10:20hand and get the new way of doing things 10:22and it's just AI slop and we're just 10:24waiting through that and that's a 10:26terrible slog. Instead, I had to yell at 10:28chat GPT5 just yesterday because I asked 10:32it for an outline with three elements 10:34and it came back with seven and I said, 10:36"You didn't put the time in on the three 10:38I asked you and you and you're just so 10:39hyper excited that you came back with 10:41with a bunch of extra." And that's a 10:43tiny little story and it's not isolated 10:45only to chat GPT5. Slop is a threat to 10:49our ability to realize the gains of AI 10:52workflows and AI productivity. And so 10:54one of the things I'm excited about is 10:55that there's some clarity in the work 10:58produced by this model that I think 11:00enables us to get back to creating 11:03really useful pieces of work, whether 11:04they're code or spreadsheets or 11:06powerpoints or what have you and then 11:09focusing on whether they're right and 11:11then iterating if they're not. And that 11:13becomes a workflow that I can get 11:14excited about because it's less sloppy 11:17and it fits into how teams already make 11:19decisions. I also think the idea of 11:22checking your work is something that 11:24we'll start to see from other models. I 11:26know models are being trained using 11:29tooling where there is some recursive 11:31looping and checking of your work. This 11:33model is by far the most thoughtful and 11:36careful about it that I have seen so 11:38far. This model really cares to 11:41understand how your prompt maps to a 11:43particular piece of work and it cares to 11:45get it right. Now, you might wonder, 11:47Nate, you've been talking a lot about 11:49docs and sheets and code and decks. Does 11:51this thing only do that? And the answer 11:53is no. I actually have used it for 11:55conversations as well. I've used it to 11:57sort of like get a sense of its thinking 11:58and its ability, how it does if I ask it 12:00to produce a response just in the chat. 12:02And I get that same sense of clarity. 12:04It's a model that really wants to cut 12:07through the noise. And it's a model that 12:09is able to give you some backbone. And I 12:11think that's somewhat related to its 12:13ability to check its work. It has a 12:15sense of rightness. It has a sense of 12:17what works and what doesn't. And when it 12:19doesn't feel like something is correct, 12:21it says so. And so, one of these subtler 12:23things that I've seen come out that you 12:25will also see is that this model has 12:27some opinions on what is correct and 12:30what is incorrect, whether you're saying 12:31it or whether the model is saying it. 12:33And that makes the model less like a 12:36hyperactive squirrel on aderall and more 12:39like a thoughtful colleague, a colleague 12:41that has opinions, a colleague that can 12:42be persuaded, but a colleague that will 12:44also push back sometimes and say, "I 12:46don't think that's quite correct." And 12:48that is a very tough balance to strike. 12:50And if Claude has been able to strike 12:53that balance with this new model, it is 12:55a very good sign for us because it helps 12:58us to have a more professional 13:00relationship with AI where we're yelling 13:03at it less. We're trying to get it to be 13:05sort of focused and directed less and 13:08we're more interested in how we can do 13:10good work together. And I'm excited 13:12about that because I really for one 13:14would like to stop telling my a models, 13:17"No, you did too much. No, you went too 13:19far in that direction. No, please stop 13:21it. I don't want to be the only one 13:22that's absolutely right around here. And 13:24so my hope is that this new model 13:27becomes a new decisioning baseline for 13:30work. So let me unpack that a little 13:32bit. I think that we reached a 13:33productivity baseline with Opus 4.1 for 13:36people who care about work. It is 13:38possible to be productive not just in 13:40conversation but with docs and sheets 13:42and code and deck in Opus 4.1 which was 13:44Claude's previous model. Now, with this 13:47model, we don't just go from being 13:49productive to perfect. We go from being 13:52productive to decisioning. And this gets 13:55at the heart of what I've been saying 13:56this entire time. This model sets you up 13:59to focus your time on making decisions 14:01that matter because the work it produces 14:03is really clear. And that's true in the 14:05chat as much as it's true in any output 14:07format that you want to select. That's 14:09what I'm excited about because it feels 14:11like we're moving from a worker's buddy 14:15that works alongside you and gets you 14:17okay drafts to a more professional 14:19colleague that is designed to help set 14:21you up to save time and make really 14:23smart decisions. That makes me really 14:25excited for the future because I would 14:27love to have an AI colleague that's more 14:29like that. And I want more interactions 14:32that keep me closer to the work and that 14:34help me to feel like I'm doing quality 14:36work because we humans take a sense of 14:37pride in that. I know you might not have 14:39expected this video to go there. You 14:40might think, well, Nate's going to just 14:42talk about the AI model and how amazing 14:43it is and how it automates things and it 14:46is amazing and clearly it automates a 14:48lot if it can go in one shot from 66 14:51pages of customer quotes to a 14:53PowerPoint. But that's not really why it 14:55matters. It matters because it pulls the 14:58humans closer to the work. We can work 15:00as colleagues together because of the 15:02ability to push back and to think 15:03clearly and to express itself well. And 15:05ultimately the work itself is higher 15:07quality and much faster in a way that we 15:10can be proud of as people because we 15:11touched it and delivered our unique 15:14stamp of perspective on it. The domain 15:16experience we have the the metabolized 15:19sense of integrity, the metabolized 15:21sense of instinct that we have as people 15:23who have expertise in our particular 15:26area. This claude model makes it easier 15:28for our expertise to shine through. So 15:31have fun. Check it out. Let me know what 15:33you think. I'm still early in my testing 15:35obviously. I've had it for a few days. 15:37I'm really excited about it. Let me 15:39know.