Learning Library

← Back to Library

Claude's Latest Model Beats GPT5

Key Points

  • The reviewer tested the new Claude model across code, PowerPoint decks, spreadsheets, and docs, benchmarking it against OpenAI’s ChatGPT‑5 and Anthropic’s own Opus 4.1, and found a noticeably larger performance jump.
  • Unlike OpenAI’s consumer‑focused approach, Anthropic is positioning Claude as a “professional AI” that directly boosts workplace productivity, and the new model’s capabilities reinforce that strategy.
  • The model outperforms Opus 4.1 in creating truly usable deliverables—such as detailed slide decks and Amazon‑style PRFAQs—meeting a long‑standing bar that many AI systems previously missed.
  • It excels at surfacing exactly where human expertise should intervene, making the collaboration between AI and domain experts clearer and more effective than with prior models.
  • While still imperfect, the reviewer sees this release as a significant step forward for AI tools aimed at helping professionals get concrete work done rather than just generating generic content.

Full Transcript

# Claude's Latest Model Beats GPT5 **Source:** [https://www.youtube.com/watch?v=p-ibfrMN0M8](https://www.youtube.com/watch?v=p-ibfrMN0M8) **Duration:** 00:15:41 ## Summary - The reviewer tested the new Claude model across code, PowerPoint decks, spreadsheets, and docs, benchmarking it against OpenAI’s ChatGPT‑5 and Anthropic’s own Opus 4.1, and found a noticeably larger performance jump. - Unlike OpenAI’s consumer‑focused approach, Anthropic is positioning Claude as a “professional AI” that directly boosts workplace productivity, and the new model’s capabilities reinforce that strategy. - The model outperforms Opus 4.1 in creating truly usable deliverables—such as detailed slide decks and Amazon‑style PRFAQs—meeting a long‑standing bar that many AI systems previously missed. - It excels at surfacing exactly where human expertise should intervene, making the collaboration between AI and domain experts clearer and more effective than with prior models. - While still imperfect, the reviewer sees this release as a significant step forward for AI tools aimed at helping professionals get concrete work done rather than just generating generic content. ## Sections - [00:00:00](https://www.youtube.com/watch?v=p-ibfrMN0M8&t=0s) **Untitled Section** - - [00:03:25](https://www.youtube.com/watch?v=p-ibfrMN0M8&t=205s) **Model Self-Checking and Tool Transparency** - The speaker explains that the new Claude model constantly narrates its reasoning, validates its work (e.g., slide design, code execution), and autonomously corrects errors, showcasing a higher level of self‑verification compared to earlier AI models. - [00:06:48](https://www.youtube.com/watch?v=p-ibfrMN0M8&t=408s) **Rapid Iterative AI Writing Workflow** - The speaker explains how the new Claude model lets them quickly produce and refine clear, high‑quality narrative content in minutes, turning AI into a smart collaborator that multiplies productivity without sacrificing human control. - [00:12:04](https://www.youtube.com/watch?v=p-ibfrMN0M8&t=724s) **AI Model as Thoughtful Colleague** - The speaker lauds the new Claude model for its clear, self‑checking reasoning and willingness to push back on errors, fostering a balanced, professional partnership instead of a frantic, uncontrollable tool. - [00:15:39](https://www.youtube.com/watch?v=p-ibfrMN0M8&t=939s) **Expressing Excitement and Request** - The speaker conveys enthusiasm about something and asks to be informed. ## Full Transcript
0:00So, over the past few days, I was lucky 0:02enough to get early access to a new 0:04Clawude model that is releasing today. I 0:07want to give you why you should care, 0:10what you should expect versus the other 0:12models out there, and where it really is 0:14going to make a difference. So, stick 0:16with me for the next 10 or 15 minutes, 0:18and we're going to get through what to 0:19make of this model, and you'll be able 0:21to figure out whether it's useful for 0:23you. Number one, what were my first 0:26impressions and top takeaways? I tested 0:28this model inside clawed code. I tested 0:31this model creating PowerPoint decks. I 0:33tested this model creating spreadsheets. 0:35I tested it creating docs. I tested its 0:37thinking. I really put it through its 0:38paces and I benchmarked it against chat 0:41GPT5 which is of course OpenAI's 0:44frontier model and also against Opus 4.1 0:47which is the current frontier model from 0:49Claude before today. And I wanted to 0:52know what was going to stand out to me. 0:54I spend hundreds and hundreds of hours 0:55in AI models. I'm very familiar with 0:57sort of the look and feel differences 0:59and I wanted to get hands-on early to 1:02see if I could tell a difference. 1:04Spoiler alert, it was a big difference. 1:06And I'm not saying that because I want 1:07to hype the model. No model is perfect. 1:10But I think that this model moves the 1:11ball forward in some really important 1:13ways for people who care about getting 1:15work done. And frankly, that's actually 1:18in line with Anthropic's larger 1:20strategy. If you look at the two big 1:22players, OpenAI and Enthropic, OpenAI 1:25continues to lean very consumer. 1:26Enthropic is adopting a specialized 1:28stance of leaning into professional AI. 1:33What does it mean to have professionals 1:35work with AI by choice and pick 1:37anthropic on purpose to get their work 1:39done? How does anthropic help them move 1:42their work deliverables forward? The 1:44signatures for that strategy were all 1:46over this model. I did a very popular 1:48guide a few weeks ago talking about Opus 1:514.1 when it released and emphasizing it 1:54was the first model that actually got as 1:56far as creating really usable 1:58spreadsheets, really usable PowerPoints, 2:00which had been a really, really tough 2:02bar to meet for AI previously. Well, 2:05this new model beats that. This new 2:08model beats Opus 4.1. And I and I put 2:10them head-to-head and I did not give 2:12them an easy assignment. They had tough 2:14assignments. They had to make tough, you 2:15know, 11 or 12 slide SAS decks. They had 2:18to make docs in an Amazon PRFAQ style. I 2:23really put them through their paces. And 2:25what stood out to me as a human 2:27observer, as someone who wants these 2:29tools to work with us in the workplace, 2:31is that this new model is what enables 2:34me to see clearly where I need to 2:37intervene. We talk a lot about AI 2:39automating AI picking up work from us. 2:41But I've been thinking a lot about this 2:43idea that the most valuable AI is the AI 2:47that helps you to see clearly when you 2:50as a good human in your domain with deep 2:52experience needs to touch the work. in 2:56this model more than chat GPT5, more 2:59than Opus 4.1, it's clear enough in its 3:02narrative that you can see really 3:05clearly what it's trying to go for and 3:07you can see really clearly where you 3:08need to touch the work to make it 3:10better. And so if you think about it 3:12within the context of a larger say deck 3:14preparation workflow, a spreadsheet 3:16model preparation workflow, this model 3:18is going to speed the time it takes to 3:21get these important pieces of work done. 3:23And it does that in a number of useful 3:25ways. And I want to call out sort of the 3:26gritty hands-on notes that I have so 3:29that you can start to think about it. 3:30One of the first takeaways as you work 3:32with this model, it's getting to that 3:35level of quality, that level of clarity 3:37on narrative by checking its work a lot 3:39more than previous models did. One of 3:41the hallmarks of the current claude 3:44style is that you have this running 3:46commentary from the model that shows you 3:49what tools it's invoking and what it's 3:51thinking about at the moment. It's sort 3:52of an express chain of thought. This 3:54model is expressing an obsession, I 3:58think that's the right word, an 3:59obsession with checking its work and 4:02fixing it. Multiple times when it was 4:04creating PowerPoint decks, I saw it 4:06measure the pixel overlap between title 4:09text and a particular visual element, 4:12correct itself, and say, "That's not 4:13right," and redo the slide. It didn't 4:16come to me and make me do that. It 4:17caught it itself. That's a big deal. It 4:20also took the time to check the formulas 4:22and spreadsheets when it was showing me 4:25a code project I was working on. It was 4:28actually going through the next.js 4:29framework and it was validating that it 4:31could start and run the dev server 4:32before coming back and telling me it 4:34could. I got to say chat GPT5 just likes 4:37to say it could do stuff, right? Like 4:38there's sort just a sort of a commitment 4:40to talk that chat GPT5 has. I'm not here 4:43to tell you which model to pick. This 4:45video should not be interpreted as me 4:47saying pick only this model to work. We 4:50live in a multimodel world. I want you 4:52to get a sense of where this model's 4:54really useful and I think it's right in 4:56line with where anthropic is going. This 4:58model is going to be useful in 5:00dramatically cutting down the grunge 5:03time that we have spent on work where 5:06you are just waiting through a lot of 5:08messy inputs where you are trying to 5:11figure out how you can understand a 5:13complicated spreadsheet where you are 5:15trying to write a draft and you just 5:17feel like your head is mush and you 5:19don't know how to get the words on the 5:21page but they need to be really clear. 5:22They can't just be any old AI slot. 5:25That's where this model's going to 5:26excel. I'll give you an example. I fed 5:28this model 66 pages of PDF voice of 5:32customer insight. So, it was all like 5:34quotes, right? Things that were out of 5:36order, not organized in any way. I just 5:38wanted to see like what it would do with 5:40raw customer utterance. And you know 5:42what it did? It was able to analyze it. 5:45And then this model in particular was 5:48able to extract meaningful narrative 5:50from it. And I think that's really 5:52important to reflect on because those 5:55kinds of insights don't make themselves 5:57happen. I used to run voice of customer 6:00when I was at Amazon. It was really, 6:02really hard to manually go through a 6:05bunch of customer utterances and they 6:06just start to meld together in your 6:08brain. It's hard to get narrative. It's 6:10hard to attach a quote to a particular 6:11insight. This is the first model I've 6:14seen that can in one shot go from 6:18a big muddle of customer quotes to an 6:21executive ready narrative arc in a 6:24PowerPoint presentation. Now, is it the 6:26most beautiful PowerPoint I've ever 6:27seen? No. Is it better even than the 4.1 6:31that I thought was usable? It actually 6:33is. This is the first PowerPoint 6:36presentation 6:38AI creation tool that has made something 6:41that is so close to ready that I would 6:44call it 90% ready to go out of the gate. 6:47A little bit of polish here and there, 6:48but that's really it. And what's handy 6:50about that is it does it in just a few 6:53minutes, which gives you a chance to do 6:55multiple iterations. Remember when I 6:57said earlier in this video that part of 6:59why I'm excited about this model is it 7:02puts us humans back in touch with the 7:04work. That clarity of narrative is what 7:06I have needed to wade through AI slop 7:09and actually find something useful. And 7:11I saw it come through not just in decks 7:12but in the clarity of presentation and 7:14spreadsheets in the clarity of working 7:16with it in cloud code. It felt like 7:18working with a good thinking partner. We 7:19were able to quickly establish a file 7:21structure to work together. It was just 7:23a dream. and in the clarity of dockw 7:25writing like it was like clear narrative 7:27and didn't feel like I had to wade 7:29through Aos thought. And so if I if I 7:31think about that and I think about the 7:33minutes it takes to make this I realize 7:35as a human who cares about good work and 7:38doing it well I have multiplied my time. 7:41And it's not that I've multiplied my 7:42time to put out more 90% good artifacts. 7:45I have given myself a shot at doing two 7:48or three of these and having progressive 7:50inputs as I look at the narrative and I 7:52shape it and I think about whether 7:54that's what I want to say and it's 7:55relatively trivial in 30 minutes or 40 7:58minutes to come out with exactly what I 8:00want because each iteration now takes 8:02five or 6 minutes to make with this new 8:04claude model. It's really easy. And if 8:06you're wondering how prompt sensitive 8:09the model is, this one's really 8:11interesting. haven't seen this in any 8:13other model and I would be curious for 8:15your take as you play with it. When I 8:17played with it, I found that it was 8:20surprisingly useful regardless of the 8:23prompt structure I applied. And so I 8:26applied a super formal prompt structure 8:29and I also applied a very casual prompt 8:31structure which was just two or three 8:33lines plus a bunch of data. In both 8:35cases I got a very usable output was 8:38healthy. It was happy. It was the kind 8:40of PowerPoint you want to show around 8:42the office. It was great. It was not a 8:44problem. And that was also the case with 8:46spreadsheets. It was also the case with 8:47docs. And if that holds up, if you're 8:50seeing that as well, what that suggests 8:53is that Anthropic is doing enough 8:55reinforcement learning on Office 8:57Primitives like Docs, like Dex, like 8:59PowerPoint that it's figuring out what 9:03we want from shorter and smaller and 9:05more casual utterances, which is a 9:07really big deal because one of the 9:09things that has made people really 9:11frustrated with chat GPT5 is that it is 9:14sensitive to prompting. I don't think 9:16it's an accident that that the chat GPT 9:19team has had to release prompt packs 9:22aimed at chat GPT5. You know who hasn't 9:25had to do that? Anthropic. They haven't 9:27had to do it because the model does a 9:30better job of understanding the kind of 9:32work that you want and just going for 9:34it. And this gets at one of the larger 9:36takeaways that I think is really 9:37interesting. Enthropic is betting on our 9:40future for the next few years at least 9:43being somewhat similar to what we have 9:45today. Despite all the big hype and all 9:46the big takeaways, they're investing in 9:48a world where we will still need 9:50PowerPoints, where we will still need 9:51spreadsheets, where we will still need 9:54the ability to run claude code as a 9:56human and get something that boots on a 9:58dev server. And what they're betting is 10:00that what we need is clearer and more 10:03professional outputs that we can 10:05understand more easily. And that in turn 10:08will mean that we take less time on the 10:10grunge of our work. Because to be 10:13honest, no one wants to trade the grunge 10:15of the old way of doing things pre202 10:18where we were just doing everything by 10:20hand and get the new way of doing things 10:22and it's just AI slop and we're just 10:24waiting through that and that's a 10:26terrible slog. Instead, I had to yell at 10:28chat GPT5 just yesterday because I asked 10:32it for an outline with three elements 10:34and it came back with seven and I said, 10:36"You didn't put the time in on the three 10:38I asked you and you and you're just so 10:39hyper excited that you came back with 10:41with a bunch of extra." And that's a 10:43tiny little story and it's not isolated 10:45only to chat GPT5. Slop is a threat to 10:49our ability to realize the gains of AI 10:52workflows and AI productivity. And so 10:54one of the things I'm excited about is 10:55that there's some clarity in the work 10:58produced by this model that I think 11:00enables us to get back to creating 11:03really useful pieces of work, whether 11:04they're code or spreadsheets or 11:06powerpoints or what have you and then 11:09focusing on whether they're right and 11:11then iterating if they're not. And that 11:13becomes a workflow that I can get 11:14excited about because it's less sloppy 11:17and it fits into how teams already make 11:19decisions. I also think the idea of 11:22checking your work is something that 11:24we'll start to see from other models. I 11:26know models are being trained using 11:29tooling where there is some recursive 11:31looping and checking of your work. This 11:33model is by far the most thoughtful and 11:36careful about it that I have seen so 11:38far. This model really cares to 11:41understand how your prompt maps to a 11:43particular piece of work and it cares to 11:45get it right. Now, you might wonder, 11:47Nate, you've been talking a lot about 11:49docs and sheets and code and decks. Does 11:51this thing only do that? And the answer 11:53is no. I actually have used it for 11:55conversations as well. I've used it to 11:57sort of like get a sense of its thinking 11:58and its ability, how it does if I ask it 12:00to produce a response just in the chat. 12:02And I get that same sense of clarity. 12:04It's a model that really wants to cut 12:07through the noise. And it's a model that 12:09is able to give you some backbone. And I 12:11think that's somewhat related to its 12:13ability to check its work. It has a 12:15sense of rightness. It has a sense of 12:17what works and what doesn't. And when it 12:19doesn't feel like something is correct, 12:21it says so. And so, one of these subtler 12:23things that I've seen come out that you 12:25will also see is that this model has 12:27some opinions on what is correct and 12:30what is incorrect, whether you're saying 12:31it or whether the model is saying it. 12:33And that makes the model less like a 12:36hyperactive squirrel on aderall and more 12:39like a thoughtful colleague, a colleague 12:41that has opinions, a colleague that can 12:42be persuaded, but a colleague that will 12:44also push back sometimes and say, "I 12:46don't think that's quite correct." And 12:48that is a very tough balance to strike. 12:50And if Claude has been able to strike 12:53that balance with this new model, it is 12:55a very good sign for us because it helps 12:58us to have a more professional 13:00relationship with AI where we're yelling 13:03at it less. We're trying to get it to be 13:05sort of focused and directed less and 13:08we're more interested in how we can do 13:10good work together. And I'm excited 13:12about that because I really for one 13:14would like to stop telling my a models, 13:17"No, you did too much. No, you went too 13:19far in that direction. No, please stop 13:21it. I don't want to be the only one 13:22that's absolutely right around here. And 13:24so my hope is that this new model 13:27becomes a new decisioning baseline for 13:30work. So let me unpack that a little 13:32bit. I think that we reached a 13:33productivity baseline with Opus 4.1 for 13:36people who care about work. It is 13:38possible to be productive not just in 13:40conversation but with docs and sheets 13:42and code and deck in Opus 4.1 which was 13:44Claude's previous model. Now, with this 13:47model, we don't just go from being 13:49productive to perfect. We go from being 13:52productive to decisioning. And this gets 13:55at the heart of what I've been saying 13:56this entire time. This model sets you up 13:59to focus your time on making decisions 14:01that matter because the work it produces 14:03is really clear. And that's true in the 14:05chat as much as it's true in any output 14:07format that you want to select. That's 14:09what I'm excited about because it feels 14:11like we're moving from a worker's buddy 14:15that works alongside you and gets you 14:17okay drafts to a more professional 14:19colleague that is designed to help set 14:21you up to save time and make really 14:23smart decisions. That makes me really 14:25excited for the future because I would 14:27love to have an AI colleague that's more 14:29like that. And I want more interactions 14:32that keep me closer to the work and that 14:34help me to feel like I'm doing quality 14:36work because we humans take a sense of 14:37pride in that. I know you might not have 14:39expected this video to go there. You 14:40might think, well, Nate's going to just 14:42talk about the AI model and how amazing 14:43it is and how it automates things and it 14:46is amazing and clearly it automates a 14:48lot if it can go in one shot from 66 14:51pages of customer quotes to a 14:53PowerPoint. But that's not really why it 14:55matters. It matters because it pulls the 14:58humans closer to the work. We can work 15:00as colleagues together because of the 15:02ability to push back and to think 15:03clearly and to express itself well. And 15:05ultimately the work itself is higher 15:07quality and much faster in a way that we 15:10can be proud of as people because we 15:11touched it and delivered our unique 15:14stamp of perspective on it. The domain 15:16experience we have the the metabolized 15:19sense of integrity, the metabolized 15:21sense of instinct that we have as people 15:23who have expertise in our particular 15:26area. This claude model makes it easier 15:28for our expertise to shine through. So 15:31have fun. Check it out. Let me know what 15:33you think. I'm still early in my testing 15:35obviously. I've had it for a few days. 15:37I'm really excited about it. Let me 15:39know.