Learning Library

← Back to Library

Codeex Upgrade Boosts Coding Precision

Key Points

  • On September 15, OpenAI released a Codeex upgrade—a specialized “ChatGPT‑5 for coding” model designed to improve the engineering platform’s performance.
  • The new model addresses two major pain points: making precise, low‑token “surgical” code edits and executing long, agentic coding tasks with far higher correctness.
  • Improvements stem from a stronger reasoning component tailored to code execution and prompt comprehension, allowing the model to allocate tokens efficiently—few for small edits, many for extensive tasks.
  • Unlike earlier “sticky” GPT‑5 behavior that required elaborate prompting to steer output, the Codeex flavor understands straightforward engineering prompts out‑of‑the‑box, reducing the need for complex prompt engineering.
  • Because developers naturally write concrete, specific instructions, this enhanced prompt awareness translates into more reliable, usable code assistance across a range of AI‑driven development workflows.

Full Transcript

# Codeex Upgrade Boosts Coding Precision **Source:** [https://www.youtube.com/watch?v=7oIkPW217AY](https://www.youtube.com/watch?v=7oIkPW217AY) **Duration:** 00:09:51 ## Summary - On September 15, OpenAI released a Codeex upgrade—a specialized “ChatGPT‑5 for coding” model designed to improve the engineering platform’s performance. - The new model addresses two major pain points: making precise, low‑token “surgical” code edits and executing long, agentic coding tasks with far higher correctness. - Improvements stem from a stronger reasoning component tailored to code execution and prompt comprehension, allowing the model to allocate tokens efficiently—few for small edits, many for extensive tasks. - Unlike earlier “sticky” GPT‑5 behavior that required elaborate prompting to steer output, the Codeex flavor understands straightforward engineering prompts out‑of‑the‑box, reducing the need for complex prompt engineering. - Because developers naturally write concrete, specific instructions, this enhanced prompt awareness translates into more reliable, usable code assistance across a range of AI‑driven development workflows. ## Sections - [00:00:00](https://www.youtube.com/watch?v=7oIkPW217AY&t=0s) **Codeex Upgrade Improves Coding AI** - The new Codeex upgrade, a ChatGPT‑5 variant optimized for programming, fixes difficult surgical edits and enhances correctness on long, agentic coding tasks, lessening the need for complex prompting workarounds. - [00:04:16](https://www.youtube.com/watch?v=7oIkPW217AY&t=256s) **Claude's Shift Toward Content Creation** - The speaker critiques Claude's new data‑connector and document‑generation features as a strategic pivot, suggesting it signals an arms‑race with competitors and highlights the importance of engineering quality over rushed releases. - [00:09:00](https://www.youtube.com/watch?v=7oIkPW217AY&t=540s) **Codeex: Precise AI Editing Breakthrough** - The speaker stresses that the upcoming Codeex release is a significant, data‑backed advancement for developers, promising targeted, surgical code edits rather than wholesale refactoring, and urges the community to look past hype and embrace the change. ## Full Transcript
0:00On Monday, September 15th, Chad GPT 0:02launched an upgrade to Codeex. Codeex, 0:05of course, is the engineering platform 0:07that chat GPT has been building out that 0:09OpenAI has been building out. In this 0:11case, the upgrade to Codeex is really a 0:13new flavor of Chat GPT5 optimized 0:16specifically for coding. It fixes two 0:18things that have been really frustrating 0:20to most of us who are building with 0:23codecs, building with cloud code, 0:25building with any AI tool. Namely, it is 0:27really, really, really hard to get them 0:30to stop and just fix one thing. Like, 0:33surgical edits have been really tough. 0:35And it has been difficult to get them to 0:37do long agentic tasks with a high degree 0:40of correctness. And that last phrase is 0:42important because if you use them, you 0:44know, they do long agentic tasks very 0:46easily, but not always with a high 0:47degree of correctness. Now, I've talked 0:49in the past about how you address this 0:51with prompting. I've talked about how 0:52you address this with data chunking. 0:54I've talked about how you address this 0:55with how you handle your codebase and 0:57feed your codebase as context and how 0:59you keep track in markdown files of the 1:01decisions you've made. There's all kinds 1:03of tricks that people are using. Those 1:04tricks are probably still helpful, but 1:06it sure does help if there is a base 1:09model that is actually better at those 1:11core tasks. And so if you ask yourself 1:14how or why does it suddenly work, I 1:17think the thing that you're going to 1:18find when you peel the onion and think 1:20about it is that they've improved the 1:23quality of the reasoner specifically 1:26around code execution tasks and 1:29understanding coding related prompts. 1:31That is the only way that you can get a 1:34model flavor that is simultaneously 1:37much much more stingy with tokens when 1:39making a surgical edit and much much 1:42more lax or uh extensive with tokens 1:45when making a long agentic task. It must 1:48understand what you want better which is 1:51a big deal when you think about it 1:53because one of the things that's been 1:55really really hard about shadow GPT5 as 1:58a whole is that it feels sticky. It 2:00feels like it's in a rut. It feels like 2:02no matter what prompt you get, you get 2:04this hyperactive speedboat of a model 2:06that says, "Here's all the action items 2:07and this is what we're going to do." And 2:09you have to really lean on the prompt 2:10heavily to get it to do anything else. 2:13And I've talked a ton about how to lean 2:14on the prompt. And I'm going to have 2:16another video soon about doing it again. 2:18But in this case, this is a flag of 2:20something different that I don't want 2:22you to lose. In this situation, the 2:24model is getting better at understanding 2:27your prompt. the model is getting better 2:29at understanding your prompt without you 2:30having to prompt fancy and that is a 2:32really big deal. Now granted it's for 2:35code. Code is probably the easiest use 2:37case for prompting parsing because 2:39frankly engineers tend to be pretty 2:41specific. Engineers tend to be very 2:43concrete. Engineers tend to refer to 2:45real specific code actions. And so yeah, 2:48getting it to be a little bit better at 2:50understanding that is probably easy mode 2:52if you're trying to get a model to get 2:54better at parsing and understanding 2:56prompting. But it's still a step. It's a 2:58big step for this model because Chad 3:00GBT5 as a whole has not made it easier 3:03for people to prompt. I know multiple 3:05people who have thrown up their hands 3:07and given up on working with advanced 3:10models, given up on prompting because 3:13chat GBT5 has been such a difficult 3:17model to prompt. I I get it. Like it's 3:19not me, right? Like I love this stuff, 3:21but like I get why it makes sense. It 3:23shouldn't be this hard. It shouldn't be 3:25this hard. Seems to be what the team was 3:26thinking about when they made this 3:28update. It should be easier. And yeah, 3:30it's got a little bit of a high score on 3:32Sweetbench and this and that. The real 3:34takeaway here is that this team at 3:37OpenAI continues to ship really, really 3:40fast. Whatever you think about the whole 3:43brewhaha around the Reddit thread on 3:46claude code and how many Redditors are 3:48real over there saying they're moving to 3:50codeex. The momentum shift toward codeex 3:54is real. There has been a massive 3:56momentum swing toward codeex and that 3:58has shifted the strategic battleground 4:01for a long time now. It has been a 4:03truism that OpenAI has the best general 4:06market position given their consumer 4:07base and Claude has the best specialist 4:10position given their beach head on code. 4:14That is changing and now you see it 4:16changing even Claude strategy because 4:18Claude is emphasizing more now. Hey, we 4:20have these data connectors. Hey, we 4:22launched this amazing PDF creation and 4:24this amazing PowerPoint creation, this 4:25great Excel creation file. I made a 4:27video on that. It's really, really good. 4:29It's a different strategy. And to me, 4:32the fact that they chose to release that 4:35and not claude code feels feels a little 4:38bit desperate. If they had something to 4:40release that would compete with where 4:42Codeex is going and how fast Codeex was 4:45shipping, they would they would. Now, I 4:48say that on Monday, September 15th, as I 4:50am aware that Daario has a big speech 4:52coming up this week and there are rumors 4:54that Opus 4.5 is coming out. So, we may 4:58be talking at the end of the week about 4:59the big move they made, and that is just 5:00how these games go, right? It's an arms 5:02race. It's back and forth. But 5:04regardless of what launches this week, 5:07you should be aware. You should be aware 5:10that the strategic landscape has shifted 5:12and that launches like this reinforce a 5:14quality of engineering effort that make 5:17the experience sticky. They make it 5:20sticky. If you have the choice between 5:23more power at your fingertips, that's 5:25correct. and more power at your 5:26fingertips that is incorrect or likely 5:28to lead to bad pull requests. You're 5:30choosing the quality every single time 5:32because it makes you do less rework. 5:34Every engineer, 10 out of 10 engineers 5:35will choose that. And they're right. And 5:37actually that goes for other parts of 5:40work, too. Part of what ironically made 5:42Claude's connectors release with Excel 5:45powerful is that it actually got more of 5:47Excel right than anything I'd seen 5:48previously from OpenAI. Similarly with 5:50PowerPoint, it was easier to make a good 5:52PowerPoint deck than it had ever been 5:53before. I even got good results out of 5:55the PDF. I haven't done the video on 5:57that, but I'm going to do the video on 5:58that. The point is this. You need to 6:01prioritize the models that give you 6:03quality work, and you need to expect 6:06that those model changes will be real, 6:09but rarer than you think. And this is 6:11sort of a fine grain point, but if you 6:13think about it, clawed code has been the 6:15best overall coding ecosystem for over a 6:18year now. And only now are we starting 6:20to see a shift toward codecs. And 6:21because these shifts are sticky, because 6:23the changes that are being made 6:25reinforce quality, because the team is 6:27shipping really fast versus claude, I 6:30expect that that shift will be sticky. 6:33Now, am I at a point where I'm willing 6:35to declare that anything is a permanent 6:37advantage in AI? Absolutely not. You 6:40should always be thinking multimodel 6:42over the long term. But there's a 6:44difference between thinking about 6:46multimodel use cases when you're 6:47building production pipelines and 6:49thinking about positions in the 6:51ecosystem. And positions in the 6:53ecosystem are stickier. They're 6:56stickier. In this case, codeex is 6:58starting to shift and nudge Claude out 7:02of the coding position in the ecosystem. 7:05That's a very powerful spot to be in 7:08because of all the other things that 7:09code allows you to unlock and get 7:10leverage on. The fact that more code is 7:13going in as reinforcement learning to 7:15open AAI is a non-trivial benefit that 7:18they are acquiring directly from another 7:20player in the ecosystem right now, 7:21directly from Claude. So, I would expect 7:25that Codeex will stick around. I'm going 7:26to be doing a much longer sort of video 7:29on Codeex. This was just my intro. This 7:31is the breaking news update. If you step 7:33back, if you look at where we are on 7:34this exponential curve that we're all 7:36living through, I think one of the 7:37things that comes to mind for me is that 7:39we are bored by the hype and we have 7:42forgotten how tremendous some of this 7:45news is because we have gotten so used 7:47to all of these updates. Humans can get 7:50used to anything. We have gotten used to 7:53a tremendous stream of news over the 7:55last two and a half years. If codeex had 7:57dropped out of a blue sky in 2022, it 8:00would have been on the front pages of 8:02all kinds of newspapers, even though it 8:04was a coding thing because it's such an 8:05intelligent model, but it's just another 8:07Monday in September now. We've gotten 8:10used to it. I want to challenge you, 8:12especially as these models get better, 8:14as they get more agentic, as they are 8:16literally the graph, the graph is there, 8:18right? As they're able to do this much 8:20more for you if you prompt them well 8:22because they're more agentic. And that's 8:23exactly what Codeex can do. If you're an 8:26engineer, the rewards are going to be 8:28disproportionate. If you do not get 8:30bored by the hype, if you stay focused, 8:32if you know what you want out of AI, and 8:34if you're able to take advantage of it 8:37and build the way you want to build, and 8:39not everybody builds with code, some 8:40people build with words, some people 8:42build with math, etc. But you have to 8:44decide what you care about, and you have 8:46to latch on to that stream, and you have 8:48to follow it, and you have to take it 8:50seriously, and you have to upgrade your 8:51tool sets a lot. The learning curve is 8:54going to be real because we're all going 8:55through this exponential curve together. 8:57Don't get fooled by everyone else saying 9:00it's just another Monday. It's not just 9:02another Monday. The news is going to 9:04keep coming. There will be more big 9:06releases even this week, but this was a 9:08big deal. And I hope you have fun 9:10building something with Codeex. I hope 9:12someone who has built code before, I 9:14really hope that this whole promise of 9:17codeex being better, which they they did 9:19quantitative analysis, right? It's not 9:20that they're just promising. you're 9:21actually looking at pull requests, etc. 9:23But I really hope that this actually 9:25bears out for all of us because it would 9:27be really nice to have an AI that does 9:29not have this obsession with refactoring 9:32the entire codebase at the drop of a 9:33hat. It would be nice to have surgical 9:35edits. And so here's to surgical edits. 9:38Here's to exponential change. Here's to 9:41seeing through the hype and recognizing 9:44it's not hype to get bored by. It really 9:45is a big deal. It's not just another 9:47Monday. Have fun with Codex.