Learning Library

← Back to Library

ChatGPT‑5 Review: Health and Coding Insights

Key Points

  • The reviewer describes Chat GPT‑5 as a “model router” that orchestrates multiple specialized sub‑models, with a heavy focus on new medical‑focused training to improve health‑care advice accuracy.
  • In the live‑stream launch, a cancer survivor highlighted the model’s more reliable medical responses, though the reviewer notes they aren’t medically qualified to fully verify the claims.
  • A major showcase was the “vibe coding” feature, positioned as a “lovable killer” that lets anyone quickly prototype apps, countering the narrative that low‑code tools are dead.
  • Developers were introduced to richer API controls—including reasoning, verbosity, and a “reasoning effort” parameter—intended to give finer‑grained guidance over the model’s output.
  • The reviewer believes the biggest wins for GPT‑5 are lower usage costs and more surgical, agentic coding assistance, enabling developers to make precise, incremental edits rather than broad rewrites.

Sections

Full Transcript

# ChatGPT‑5 Review: Health and Coding Insights **Source:** [https://www.youtube.com/watch?v=DbX_0_0LGag](https://www.youtube.com/watch?v=DbX_0_0LGag) **Duration:** 00:19:55 ## Summary - The reviewer describes Chat GPT‑5 as a “model router” that orchestrates multiple specialized sub‑models, with a heavy focus on new medical‑focused training to improve health‑care advice accuracy. - In the live‑stream launch, a cancer survivor highlighted the model’s more reliable medical responses, though the reviewer notes they aren’t medically qualified to fully verify the claims. - A major showcase was the “vibe coding” feature, positioned as a “lovable killer” that lets anyone quickly prototype apps, countering the narrative that low‑code tools are dead. - Developers were introduced to richer API controls—including reasoning, verbosity, and a “reasoning effort” parameter—intended to give finer‑grained guidance over the model’s output. - The reviewer believes the biggest wins for GPT‑5 are lower usage costs and more surgical, agentic coding assistance, enabling developers to make precise, incremental edits rather than broad rewrites. ## Sections - [00:00:00](https://www.youtube.com/watch?v=DbX_0_0LGag&t=0s) **ChatGPT‑5 First Impressions & Healthcare** - The speaker provides a quick rundown of ChatGPT‑5’s multi‑model routing architecture and its emphasized medical‑advice improvements, sharing personal testing insights that aren’t covered elsewhere. - [00:03:20](https://www.youtube.com/watch?v=DbX_0_0LGag&t=200s) **ChatGPT-5 Builds Travel Itinerary App** - The speaker notes that AI models often excel only in their native environments and recounts asking ChatGPT‑5 to create a configurable Japan travel‑itinerary app, which surprisingly produced a fully functional, click‑through application. - [00:07:33](https://www.youtube.com/watch?v=DbX_0_0LGag&t=453s) **Creating a Nightmare CSV Test** - The speaker deliberately built a tangled set of malformed CSV files—including SQL injection, inconsistent formatting, and contradictory employee data—to push an LLM into detecting duplicates, security flaws, and business‑critical insights without explicit instructions. - [00:10:41](https://www.youtube.com/watch?v=DbX_0_0LGag&t=641s) **Prompt Crafting Critical for Advanced Models** - The speaker argues that simple prompts are easy, but obtaining accurate, complex results from newer LLMs—especially in “think‑hard” mode where ChatGPT‑5 outperformed rivals—requires sophisticated prompting. - [00:15:43](https://www.youtube.com/watch?v=DbX_0_0LGag&t=943s) **Evolving AI Perception and Trust** - The speaker downplays media hype to emphasize that the newest model’s reduced hallucinations, stronger performance in medical, writing, and coding tasks, and closer partnership feel will engender greater user trust, even if the improvements aren’t immediately obvious to everyone. - [00:18:59](https://www.youtube.com/watch?v=DbX_0_0LGag&t=1139s) **Evaluating AI Model Utility** - The speaker urges listeners to look beyond hype about the newest LLM, judging it by its real‑world usefulness, strengths, weaknesses, and ongoing progress rather than proclaiming it as AGI. ## Full Transcript
0:00These are my first full unfiltered 0:02impressions of how Chad GPT5 actually 0:05lands for work. Like most of you, I 0:08watched the live stream. I'm going to 0:09assume here you can go watch the live 0:11stream if you want. I'll give you a 0:14first brief look at what's in the box 0:16for Chat GPT5 if you didn't kind of read 0:18the news. But I will not take long on 0:20that cuz we're getting into how I 0:22actually tested it and what your 0:23takeaway should be, which you're not 0:25going to find on all the other places. 0:27So basically Chad GPT5 is a bunch of 0:29models in a trench code. It is a model 0:32router and there's a bunch of Chad GPT5s 0:34underneath that it's routing to and it 0:36has had some special training. The 0:38special training comes out in the 0:40healthcare side during the broadcast. 0:42They had a cancer survivor come up and 0:45talk about how she used Chad GPT5 versus 0:47using earlier models. It really walked 0:50the line between kind of icky because 0:52like exploiting the disease versus in 0:54the experience of of the person 0:55suffering it versus sort of talking 0:57about healthcare. From a technical point 0:58of view, they've invested really heavily 1:01in making sure that since people are 1:03using Chad GPT for medical advice, they 1:06are going to get medical advice that is 1:08more accurate than the average large 1:09language model. So that's a huge area of 1:11investment. They emphasized it. It comes 1:13through on the benchmarks. Look, I don't 1:14have a medical degree. That was not one 1:16I was qualified to test. anecdotally it 1:19seems to be better is the way I'll leave 1:20it and I'm sure we're going to get that 1:22answer out of comments on this video or 1:24out of others who are trying chat GPT5 1:26with real world medical conditions. The 1:29other area where they are really 1:30emphasizing in this sort of mixture of 1:32models approach is the coding and applet 1:34side. I looked at some of the demos in 1:37the in the live stream video that they 1:39did and it felt like a lovable killer. 1:41Now I love lovable.dev. do not walk away 1:44from this and hear that these vibe 1:47coding tools are dead. I don't think 1:49that's true, but I think that's what 1:52they wanted you to think because they 1:54oneshotted these apps and they showed 1:56how you could vibe code multiple apps 1:58and you could build them and you could 1:59just do it for yourself. It was very 2:01much a everybody can code now message. 2:04And then they brought the developers in 2:05to talk about how you actually can use 2:07the API and how uh you have more 2:10reasoning controls than you had before 2:12and verbosity controls and a reasoning 2:14effort parameter and all of these 2:16in-depth stuff for developers after they 2:18got the vibe coding out of the way. I 2:19actually played with the coding. I 2:21played with vibe coding it. I looked at 2:23the API a little bit. I got to say I 2:25think that where they're actually 2:27winning is on bringing the cost down out 2:29of the gate so people use it more and on 2:32pushing the model to code more 2:34completely and usefully and to code more 2:37agentically when you're working with it. 2:39So what I mean by aentically is have it 2:41code more surgically and make more 2:43surgical edits. These are incremental 2:45improvements but they add up to 2:48something special in the canvas app. And 2:50what's interesting is it is not clear if 2:53adding up to something special in the 2:55canvas app means that it will be special 2:57in cursor or special in lovable where 3:00these tools are already available. You 3:02can get chat gpt5 in lovable or cursor 3:05right now. I tried it and I felt sort of 3:08like with claude code in the terminal 3:10where claude code absolutely sings in 3:12the terminal in a way that claude 3:13doesn't sing quite the same and hit 3:15quite the same in cursor or in lovable. 3:18And I find that really interesting. We 3:20have these two examples now where model 3:22makers are basically giving incredible 3:24results inside their preferred 3:26environments, but not necessarily when 3:29you plug them in other places. I don't 3:31know if that's intentional or if there's 3:33something about the environment their 3:35reinforcement learned on or what, but it 3:37remains true that I gave a fairly 3:40complex coding task to chat GPT5. And I 3:44asked it specifically to do a bunch of 3:46web research to research a bunch of 3:48travel destinations that were specific 3:50and real in Japan for an upcoming trip 3:53that I'd like to take, sort of a dream 3:54trip. I haven't got my tickets yet, but 3:56hey, we're having fun. And I said, I 3:57want an itinerary and I want it 3:59configurable by different interests like 4:01whether I want to go to Zen temples, 4:02whether I want to go and eat ramen, 4:04whether I want to go to onen, etc. And 4:07it was a fairly complicated prompt, 4:08right? You have to build an applet that 4:10lets me figure out my travel itinerary 4:13and that lets me choose different 4:14emphases like, hey, I want a ramen heavy 4:16day today, right? Who doesn't? I want a 4:18temple heavy day because I'm digesting 4:20all the ramen, whatever it is. And it 4:22needs to be an app that works, right? An 4:23app where I can go through and say, 4:24okay, so this is the day. this is the 4:26narrative of the day, etc. What I found 4:28is that chat GPT5 in the canvas app did 4:33deliver a fully working app with real 4:36destinations that I could click through 4:37and use. And I actually have in the 4:39Substack a link to that applet that you 4:41can like play with it and you can see 4:43how it works. But I gave the exact same 4:45prompt to lovable using chat GPT5 and I 4:49got essentially the white screen of 4:51death. like it it technically produced 4:53some text, but there was no design, no 4:55interactivity. I would grade it a 4:57complete fail. And I find that 4:59fascinating. I got a complete fail on 5:02the same model with the same prompt for 5:04the same coding challenge in two 5:06different environments. There is 5:08something going on with the way they're 5:09prioritizing canvas. And I think it's 5:11really interesting. I also found that 5:14this model, this collection of models, 5:16this chat GPT5, all the friends we met 5:18along the way, as I think I heard 5:19someone say a few months ago, all of 5:21these Chad GPT5s in a box are better at 5:25answering in code and proving it with 5:27code and math than they are at most 5:29other things. And that continues a 5:31longtime trend. If you were following 5:33the 03 model generation, that was very 5:35much how they worked. And it continues 5:38today. If you ask them the model to 5:40prove it, it does better. If you ask it 5:42to code it, it does better. As an 5:44example, I was playing around with Gant 5:46charting and I asked the model, can you 5:50show me a Gant chart of the Apollo 13 5:52mission? It clearly did the research. It 5:54laid out all of these components of the 5:56build and kind of what the critical path 5:58was to the error that led to the 5:59disaster on Apollo 13. It knew what it 6:02was talking about, and this is publicly 6:03available information, but it could not 6:06for the life of it write a Gant chart 6:08that was easy to look at. did it did one 6:10that was visible for launch day, but it 6:13did not do one that was very readable 6:16for the whole build cycle of the rocket. 6:18But when I asked it to code it, it was 6:21able to code that and it was able to 6:23code out a full Gant chart I could 6:24follow. Still a little bit of an eye 6:26chart, but it was able to do it. Now, I 6:27will call out in both cases for the for 6:29the Japan travel app and for the Apollo 6:3213 mission, in both cases, it could 6:36overindex and break the app relatively 6:38easily. So I will encourage you to 6:40checkpoint publish when you're done with 6:42them. These are little applets that are 6:44not very durable and the thing does 6:46overbuild and cause bugs sometimes. And 6:48so that's part of why I saved and 6:49published these so you can actually see 6:50how it works in practice. So much for 6:52the coding side of things. Another thing 6:54that they really emphasized was the 6:57quality of thought and how thoughtful 7:00these models are and that they can solve 7:02gnarly real world problems. In fact, 7:03that's the first thing Sam Alman said in 7:06the introductory video as he was setting 7:08up the live stream. He said, "This is 7:10about making your work more effective or 7:12something like that." And then what I 7:13noticed was there was almost nothing on 7:15making your work more effective the rest 7:17of the live stream except coding, which 7:19there was a ton of. And it made me 7:21think, how much do the execs at OpenAI 7:23think the real work is coding versus 7:25everything else? Because I didn't see a 7:27lot other than saying, "Hey, it writes 7:29better," which was that one little demo. 7:31I didn't see a lot for everybody else. 7:33So, I decided to test it. What I did was 7:36I created what I called a gnarly gnarly 7:40gnarly test file. It was three separate 7:42CSVs. The CSVs, and I'll share them on 7:45the Substack. The CSVs are entangled. 7:48The CSVs are not dependable. There is a 7:51SQL injection attack in one of the CSVs. 7:55They don't have common formatting. I 7:57didn't even save them correctly as CSVs. 7:59I basically tried to turn these three 8:01CSV files into the worst disaster of a 8:05test I could imagine. Like it it's like 8:08crawling over mud with barbed wire for 8:10an LLM. I wanted to make it really hard. 8:12Part of why is because they admitted on 8:14the live stream that our benchmarks are 8:17getting saturated and I still have 8:19trouble with realworld tests. And so I 8:23needed something that felt like the kind 8:25of messy data that I see in the real 8:28world. And the CSVs encapsulated a real 8:31world scenario with employees that are 8:33overloaded and underloaded in projects 8:35that are off track and on track and the 8:37need to be auditable and the need to 8:39prove budgeting and the need to get to 8:40revenue. All of this stuff that 8:42businesses care about all in one gnarly 8:44scenario. And then I asked I asked the 8:46model very simply to make sense of it. 8:49Basically come back explain what 8:51happened get to a clear picture of the 8:54number of employees on the team which is 8:56very confusing. find the duplicates. 8:58Make sure that you catch the SQL 9:00injection, all of that, which I didn't 9:01tell it. It had to detect that on its 9:03own. And then make sure you can come 9:04back to the board with a clear picture 9:05of what happened. This is where this is 9:08where it gets interesting, guys. This 9:10test is the thing that showed me that 9:14this model cares more about how you 9:16drive it than any other model previously 9:19because I ran this same test on Claude 9:22Code. I ran it on 03. I ran it on 03 9:25Pro. And I ran it on chat GPT5. And not 9:28just one chat GPT5 either. I ran it on 9:32chat GPT5 vanilla. I ran it on chat GPT 9:36telling it in the prompt to think hard. 9:38I ran it on chat GPT5 clicking the 9:40button think hard. And I even ran it on 9:43chat GPG Pro. And you know what? You 9:45would not believe that GPT5 vanilla with 9:49no think hard got the lowest score of 9:51the lot. It was lower than 03. It was 9:53lower than 03 Pro. It was lower than 9:55Claude Code. than all the other chat 9:57GPT5 responses. In other words, chat 9:59GPT5 was both the best and the worst 10:02response in the set. And I thought that 10:04was really interesting. I thought that 10:05was fascinating. Chad GPT5 Pro was also 10:09not the best response in the set. It 10:11overindexed a little bit. The best 10:13response in the SAT was chat GPT5 with 10:17the think hard button pushed closely 10:20followed by chat GPT5 with think hard 10:24typed in the prompt box. In other words, 10:26part of your job with this model and 10:28part of what I'm going to be doing in 10:30the coming days is digging into when and 10:34how you prompt these models for the kind 10:36of task that you have in front of you. 10:39These people who say that prompting 10:41doesn't matter have not played with this 10:43model. This is getting harder and harder 10:45and harder to prompt. It is getting 10:46trickier to prompt. Yes, if you are 10:49doing casual work and you just want to 10:51gesture vaguely at a piece of work and 10:53you're not too worried about it, it has 10:54never been easier to prompt. That is 10:56true. It has never been easier to say, 10:58"I want an itinerary for Japan." And 11:00just gesture vaguely at it. It will come 11:01up with something. So, that part is 11:03easy. But getting really complex work 11:06done, doing something like what I gave 11:08it where correctness matters, accuracy 11:10matters, the documents don't agree. It's 11:12a very complex context window, that 11:15takes work. Now, to be fair, lest you 11:17think that I'm sort of negging chat 11:20GPT5. Chat GPT5 with think hard M mode 11:25enabled, whether that was through typing 11:26the chat or through the button, did beat 11:29every other model. It beat Claude Code. 11:31It beat 03. It beat 03 Pro. It beat Chat 11:34GPT5 Pro. And it beat Chad GPT5 with 11:38non-thinking enabled, just the vanilla 11:40version. And so this model, if done 11:43correctly, does things I've never seen a 11:44model do. Like this was a really hard 11:47test. I've never seen any other model 11:48get close on it. And I would give the 11:51responses of the think hard versions of 11:54chat cheap GPT5 an A minus in both 11:57cases. They're both solid responses. 11:59Everything else was was B or worse. And 12:02so my conclusion early on in this chat 12:04GPT25 experience having wrestled with 12:07this model really extensively prompting 12:09isn't going anywhere. This model is 12:11strong at coding. This model needs you 12:14to give it really clear indicators of 12:16intent and depth or it will go off the 12:19rails. You need to know what you need to 12:21ask for to get a good response. And so a 12:24lot of people who don't know that are 12:26still going to underuse the power of the 12:28model because they don't realize how 12:30much is under the surface of think hard 12:32or clicking the thinking button. Don't 12:34be that person. I'm also going to call 12:36out that they were right that it's a 12:38better writer. I've spent a lot of time 12:40talking about data analysis. I've talked 12:42about coding. The model's writing is the 12:44best I've ever seen from Chad GPT. I 12:46loved's writing. I thought it was a 12:48great writer. Chai GBT5 is at least as 12:50good and strikes me as slightly better 12:52with cadence and pros. It's clear. It 12:55still tends to over anchor to the 12:58recency of the prompt. So if you give it 12:59a prompt, it tends to like glom onto 13:02that and you may have issues with 13:03framing when you're trying to write. So 13:05again, it rewards clarity of intent. But 13:08it is a really really thoughtful writer 13:11and it writes with pros that is not 13:13horrific to read, which is kind of nice. 13:16I will also say it's a good reader. 13:17actually fed it an essay in handwriting 13:20and it was able to quickly decode the 13:22handwriting, decode the separate set of 13:23handwriting for edits and generate its 13:25own coherent thinking around the essay 13:28and it was it was a fair critique like 13:29it was a good essay critique. So, it's a 13:31solid reader. It's able to be fully 13:33multimodal in that regard. And I I think 13:36that people who are going to be using 13:38it, who are non-coders, who are non-data 13:40people who are in say the marketing 13:42world, the customer success world, the 13:44exec world where you're preparing 13:45presentations, it's going to feel like a 13:47great daily driver for that because it's 13:49going to give you one-shot graphs. It's 13:52going to give you great drafts. It's 13:54going to help you think through. It 13:55feels like a thinking partner. Now, this 13:57is where I include the obligatory 14:00caveats or cautions. I've talked a fair 14:03bit about some of the things that went 14:05wrong and some of the things that went 14:06right in all of these individual cases 14:08with coding, with writing, etc. I want 14:10to call out that there's been this huge 14:12uh backlash on the web in response to 14:15chat GPT5 because there's been this 14:18assumption that it was overhyped that 14:20the model should not have been given the 14:22hype it was and we are still not 14:25anywhere close to artificial general 14:27intelligence. Now the model immediately 14:29jumped to number one on the model boards 14:31and also at the same time poly market 14:33the betting market immediately crashed 14:35the model and said it wasn't the best 14:36model in the world. It seems like 14:37everyone's having really really big 14:39reactions today and not very many people 14:41are doing really really thoughtful 14:42testing. I don't think that whether it's 14:44the best model in the world matters all 14:47that much because that's always a moving 14:48target. If you had to put me to the wall 14:50today and say Nate pick, yeah, I would 14:52say it is. I would say properly prompted 14:54Chad GPT5 is the best model in the 14:56world. That being said, I think the 14:58important thing is actually to recognize 15:01where this fits into the evolving edge 15:04of intelligence and where we still see 15:07areas where models struggle. So, they 15:09emphasized that they're working on 15:11hallucinations, they're working on safer 15:14completions, they're working on less 15:15deception. I see some progress there. It 15:19does feel like it hallucinates less than 15:2103. I still caught it hallucinating a 15:23couple times in my test today. It's not 15:25perfect. I also see that there are going 15:29to be continued assumptions that models 15:34produce the same splash for the same 15:36reader as the moment when their 15:38perception initially shifted on AI. 15:40There is a frog boiling in the pot 15:43problem around media reaction right now. 15:45And I'm putting in this at the end 15:47because media reaction isn't the most 15:48important thing, but I am including it 15:50because I think the way we think about 15:51the evolving intelligence curve does 15:53matter. We're living through a 15:54historical moment. This model is a 15:57significant step forward in the way we 16:00interact with AI. It is closer to 16:02interacting with us as a thought partner 16:05and I think the reduction in 16:07hallucinations helps there. I think the 16:08work done on stuff like medical where 16:10it's high value helps there. I think the 16:12work done on improving writing helps 16:14there. People are going to feel like 16:16they can trust this model more. People 16:18are going to feel like it's right more 16:19and they will be correct about that. 16:22If you compare that to the flashbang 16:24that came when chat GPT entered the 16:26scene in the first place, or the 16:27stunning jump to chat GPT4, which may be 16:30hard to remember, but it was there, or 16:31the jump to 03 reasoning, people, I 16:34think, are assuming that it will feel 16:36the same way with Chad GPT5. And what I 16:39want to leave you with is this. It may 16:41not feel the same way to you because the 16:44model may be getting that much more 16:47intelligent in ways you don't care 16:49about. I think this was a really big 16:51jump on the coding side, but you may not 16:53care about that. And it was a jump on 16:56the coding side in a world where 16:57realistically claude code has had the 16:59crown for a while. And so people are 17:01going to say, well, does it be cla etc. 17:02And there's going to be a lot of debate 17:03about that. I think this was a big jump 17:05on reliability. I think the medical 17:07thing was rightly emphasized. People 17:09using it for personal use cases that 17:11really matter. Medical is a big one. 17:13Probably legal is another one. It 17:14matters to get it right. And so my my 17:17suggestion to you is that if getting 17:19medical information more correct, 17:21significantly gigantically more correct 17:24doesn't feel like a step change to you, 17:26maybe you should check your assumptions 17:28because for people making life or death 17:30decisions, it's going to matter. Getting 17:32it, you know, significantly more 17:34correct, 2, 3x more correct, reducing 17:36errors to like 1 point whatever percent, 17:39which I think is what it is on their new 17:40health bench matters a lot. Getting 17:43writing to feel more natural to people 17:45who are trying to use it to help with 17:46their writing is a big step forward. 17:48Getting a daily driver that's a reasoner 17:50is a huge step forward. Even if people 17:52don't fully understand what think hard 17:54is without, I don't know, watching 17:56videos like this. My point is this. We 17:58have 18:00an extraordinary opportunity to use a 18:03model that is advancing intelligence 18:06jaggedly. It's a mixture of models, 18:08which is what I said at the beginning of 18:10this video. There are big jumps in many 18:13of these models underneath. Jumps in 18:15coding, jumps in writing that I've 18:16talked about, jumps on the medical 18:17piece, jumps on hallucinations. If you 18:20care about those things, they're going 18:21to feel really, really big. If all you 18:24care about is the whisbang of it didn't 18:26do it before and now it does it, this is 18:29not the model update for you because at 18:33the end of the day, it does what the 18:34other models did before only better. I 18:37think that that is fully in line with 18:39expectations. I think that we are living 18:41through an ongoing intelligence 18:43evolution and none of us know where it's 18:45all going to end up or top out. And to 18:47me, this feels roughly in line with the 18:48ongoing intelligence explosion. And we 18:51will see more updates in chat GPT6 and 18:54we will see more updates from Gemini and 18:56we'll see more updates from Claude. 18:57Claude talked about it already. We'll 18:59see more from Grock. Enjoy this step in 19:02the new intelligence explosion. Don't 19:03overindex on whether you personally 19:06think this is the best model in the 19:07world or not. figure out if it's useful 19:09to you. Use it well. If it's not useful 19:12to you, another model is going to come 19:13along next week that will be like that's 19:15the world we're living in. It's an 19:16incredible world. And look at the 19:18overall trajectory. We are arguing about 19:20how amazing this model is and how much 19:23of a surprise it is when 2 years ago if 19:26we had seen this model, we would all 19:28have been swearing that this was 19:30artificial general intelligence come to 19:32us out of the rocks. So, I don't know. I 19:35don't really care if that's what we call 19:36it. I care whether it does useful 19:38things. I care where the real weaknesses 19:40are. And I care whether it shows 19:42continued progress. And I hope you've 19:43gotten a sense of where those strengths 19:45are, where the real world weaknesses 19:46are, and honestly, the sense that we 19:48have continued progress. This is my new 19:50daily driver. Check out Chad GPT5 and 19:53let me know what you