Learning Library

← Back to Library

Codeex Upgrade Boosts Coding Precision

9m • Unknown Channel • ai-ml • news • intermediate • Watch on YouTube ↗

Key Points

On September 15, OpenAI released a Codeex upgrade—a specialized “ChatGPT‑5 for coding” model designed to improve the engineering platform’s performance.
The new model addresses two major pain points: making precise, low‑token “surgical” code edits and executing long, agentic coding tasks with far higher correctness.
Improvements stem from a stronger reasoning component tailored to code execution and prompt comprehension, allowing the model to allocate tokens efficiently—few for small edits, many for extensive tasks.
Unlike earlier “sticky” GPT‑5 behavior that required elaborate prompting to steer output, the Codeex flavor understands straightforward engineering prompts out‑of‑the‑box, reducing the need for complex prompt engineering.
Because developers naturally write concrete, specific instructions, this enhanced prompt awareness translates into more reliable, usable code assistance across a range of AI‑driven development workflows.

Sections

Full Transcript

# Codeex Upgrade Boosts Coding Precision **Source:** [https://www.youtube.com/watch?v=7oIkPW217AY](https://www.youtube.com/watch?v=7oIkPW217AY) **Duration:** 00:09:51 ## Summary - On September 15, OpenAI released a Codeex upgrade—a specialized “ChatGPT‑5 for coding” model designed to improve the engineering platform’s performance. - The new model addresses two major pain points: making precise, low‑token “surgical” code edits and executing long, agentic coding tasks with far higher correctness. - Improvements stem from a stronger reasoning component tailored to code execution and prompt comprehension, allowing the model to allocate tokens efficiently—few for small edits, many for extensive tasks. - Unlike earlier “sticky” GPT‑5 behavior that required elaborate prompting to steer output, the Codeex flavor understands straightforward engineering prompts out‑of‑the‑box, reducing the need for complex prompt engineering. - Because developers naturally write concrete, specific instructions, this enhanced prompt awareness translates into more reliable, usable code assistance across a range of AI‑driven development workflows. ## Sections - [00:00:00](https://www.youtube.com/watch?v=7oIkPW217AY&t=0s) **Codeex Upgrade Improves Coding AI** - The new Codeex upgrade, a ChatGPT‑5 variant optimized for programming, fixes difficult surgical edits and enhances correctness on long, agentic coding tasks, lessening the need for complex prompting workarounds. - [00:04:16](https://www.youtube.com/watch?v=7oIkPW217AY&t=256s) **Claude's Shift Toward Content Creation** - The speaker critiques Claude's new data‑connector and document‑generation features as a strategic pivot, suggesting it signals an arms‑race with competitors and highlights the importance of engineering quality over rushed releases. - [00:09:00](https://www.youtube.com/watch?v=7oIkPW217AY&t=540s) **Codeex: Precise AI Editing Breakthrough** - The speaker stresses that the upcoming Codeex release is a significant, data‑backed advancement for developers, promising targeted, surgical code edits rather than wholesale refactoring, and urges the community to look past hype and embrace the change. ## Full Transcript

0:00On Monday, September 15th, Chad GPT 0:02launched an upgrade to Codeex. Codeex, 0:05of course, is the engineering platform 0:07that chat GPT has been building out that 0:09OpenAI has been building out. In this 0:11case, the upgrade to Codeex is really a 0:13new flavor of Chat GPT5 optimized 0:16specifically for coding. It fixes two 0:18things that have been really frustrating 0:20to most of us who are building with 0:23codecs, building with cloud code, 0:25building with any AI tool. Namely, it is 0:27really, really, really hard to get them 0:30to stop and just fix one thing. Like, 0:33surgical edits have been really tough. 0:35And it has been difficult to get them to 0:37do long agentic tasks with a high degree 0:40of correctness. And that last phrase is 0:42important because if you use them, you 0:44know, they do long agentic tasks very 0:46easily, but not always with a high 0:47degree of correctness. Now, I've talked 0:49in the past about how you address this 0:51with prompting. I've talked about how 0:52you address this with data chunking. 0:54I've talked about how you address this 0:55with how you handle your codebase and 0:57feed your codebase as context and how 0:59you keep track in markdown files of the 1:01decisions you've made. There's all kinds 1:03of tricks that people are using. Those 1:04tricks are probably still helpful, but 1:06it sure does help if there is a base 1:09model that is actually better at those 1:11core tasks. And so if you ask yourself 1:14how or why does it suddenly work, I 1:17think the thing that you're going to 1:18find when you peel the onion and think 1:20about it is that they've improved the 1:23quality of the reasoner specifically 1:26around code execution tasks and 1:29understanding coding related prompts. 1:31That is the only way that you can get a 1:34model flavor that is simultaneously 1:37much much more stingy with tokens when 1:39making a surgical edit and much much 1:42more lax or uh extensive with tokens 1:45when making a long agentic task. It must 1:48understand what you want better which is 1:51a big deal when you think about it 1:53because one of the things that's been 1:55really really hard about shadow GPT5 as 1:58a whole is that it feels sticky. It 2:00feels like it's in a rut. It feels like 2:02no matter what prompt you get, you get 2:04this hyperactive speedboat of a model 2:06that says, "Here's all the action items 2:07and this is what we're going to do." And 2:09you have to really lean on the prompt 2:10heavily to get it to do anything else. 2:13And I've talked a ton about how to lean 2:14on the prompt. And I'm going to have 2:16another video soon about doing it again. 2:18But in this case, this is a flag of 2:20something different that I don't want 2:22you to lose. In this situation, the 2:24model is getting better at understanding 2:27your prompt. the model is getting better 2:29at understanding your prompt without you 2:30having to prompt fancy and that is a 2:32really big deal. Now granted it's for 2:35code. Code is probably the easiest use 2:37case for prompting parsing because 2:39frankly engineers tend to be pretty 2:41specific. Engineers tend to be very 2:43concrete. Engineers tend to refer to 2:45real specific code actions. And so yeah, 2:48getting it to be a little bit better at 2:50understanding that is probably easy mode 2:52if you're trying to get a model to get 2:54better at parsing and understanding 2:56prompting. But it's still a step. It's a 2:58big step for this model because Chad 3:00GBT5 as a whole has not made it easier 3:03for people to prompt. I know multiple 3:05people who have thrown up their hands 3:07and given up on working with advanced 3:10models, given up on prompting because 3:13chat GBT5 has been such a difficult 3:17model to prompt. I I get it. Like it's 3:19not me, right? Like I love this stuff, 3:21but like I get why it makes sense. It 3:23shouldn't be this hard. It shouldn't be 3:25this hard. Seems to be what the team was 3:26thinking about when they made this 3:28update. It should be easier. And yeah, 3:30it's got a little bit of a high score on 3:32Sweetbench and this and that. The real 3:34takeaway here is that this team at 3:37OpenAI continues to ship really, really 3:40fast. Whatever you think about the whole 3:43brewhaha around the Reddit thread on 3:46claude code and how many Redditors are 3:48real over there saying they're moving to 3:50codeex. The momentum shift toward codeex 3:54is real. There has been a massive 3:56momentum swing toward codeex and that 3:58has shifted the strategic battleground 4:01for a long time now. It has been a 4:03truism that OpenAI has the best general 4:06market position given their consumer 4:07base and Claude has the best specialist 4:10position given their beach head on code. 4:14That is changing and now you see it 4:16changing even Claude strategy because 4:18Claude is emphasizing more now. Hey, we 4:20have these data connectors. Hey, we 4:22launched this amazing PDF creation and 4:24this amazing PowerPoint creation, this 4:25great Excel creation file. I made a 4:27video on that. It's really, really good. 4:29It's a different strategy. And to me, 4:32the fact that they chose to release that 4:35and not claude code feels feels a little 4:38bit desperate. If they had something to 4:40release that would compete with where 4:42Codeex is going and how fast Codeex was 4:45shipping, they would they would. Now, I 4:48say that on Monday, September 15th, as I 4:50am aware that Daario has a big speech 4:52coming up this week and there are rumors 4:54that Opus 4.5 is coming out. So, we may 4:58be talking at the end of the week about 4:59the big move they made, and that is just 5:00how these games go, right? It's an arms 5:02race. It's back and forth. But 5:04regardless of what launches this week, 5:07you should be aware. You should be aware 5:10that the strategic landscape has shifted 5:12and that launches like this reinforce a 5:14quality of engineering effort that make 5:17the experience sticky. They make it 5:20sticky. If you have the choice between 5:23more power at your fingertips, that's 5:25correct. and more power at your 5:26fingertips that is incorrect or likely 5:28to lead to bad pull requests. You're 5:30choosing the quality every single time 5:32because it makes you do less rework. 5:34Every engineer, 10 out of 10 engineers 5:35will choose that. And they're right. And 5:37actually that goes for other parts of 5:40work, too. Part of what ironically made 5:42Claude's connectors release with Excel 5:45powerful is that it actually got more of 5:47Excel right than anything I'd seen 5:48previously from OpenAI. Similarly with 5:50PowerPoint, it was easier to make a good 5:52PowerPoint deck than it had ever been 5:53before. I even got good results out of 5:55the PDF. I haven't done the video on 5:57that, but I'm going to do the video on 5:58that. The point is this. You need to 6:01prioritize the models that give you 6:03quality work, and you need to expect 6:06that those model changes will be real, 6:09but rarer than you think. And this is 6:11sort of a fine grain point, but if you 6:13think about it, clawed code has been the 6:15best overall coding ecosystem for over a 6:18year now. And only now are we starting 6:20to see a shift toward codecs. And 6:21because these shifts are sticky, because 6:23the changes that are being made 6:25reinforce quality, because the team is 6:27shipping really fast versus claude, I 6:30expect that that shift will be sticky. 6:33Now, am I at a point where I'm willing 6:35to declare that anything is a permanent 6:37advantage in AI? Absolutely not. You 6:40should always be thinking multimodel 6:42over the long term. But there's a 6:44difference between thinking about 6:46multimodel use cases when you're 6:47building production pipelines and 6:49thinking about positions in the 6:51ecosystem. And positions in the 6:53ecosystem are stickier. They're 6:56stickier. In this case, codeex is 6:58starting to shift and nudge Claude out 7:02of the coding position in the ecosystem. 7:05That's a very powerful spot to be in 7:08because of all the other things that 7:09code allows you to unlock and get 7:10leverage on. The fact that more code is 7:13going in as reinforcement learning to 7:15open AAI is a non-trivial benefit that 7:18they are acquiring directly from another 7:20player in the ecosystem right now, 7:21directly from Claude. So, I would expect 7:25that Codeex will stick around. I'm going 7:26to be doing a much longer sort of video 7:29on Codeex. This was just my intro. This 7:31is the breaking news update. If you step 7:33back, if you look at where we are on 7:34this exponential curve that we're all 7:36living through, I think one of the 7:37things that comes to mind for me is that 7:39we are bored by the hype and we have 7:42forgotten how tremendous some of this 7:45news is because we have gotten so used 7:47to all of these updates. Humans can get 7:50used to anything. We have gotten used to 7:53a tremendous stream of news over the 7:55last two and a half years. If codeex had 7:57dropped out of a blue sky in 2022, it 8:00would have been on the front pages of 8:02all kinds of newspapers, even though it 8:04was a coding thing because it's such an 8:05intelligent model, but it's just another 8:07Monday in September now. We've gotten 8:10used to it. I want to challenge you, 8:12especially as these models get better, 8:14as they get more agentic, as they are 8:16literally the graph, the graph is there, 8:18right? As they're able to do this much 8:20more for you if you prompt them well 8:22because they're more agentic. And that's 8:23exactly what Codeex can do. If you're an 8:26engineer, the rewards are going to be 8:28disproportionate. If you do not get 8:30bored by the hype, if you stay focused, 8:32if you know what you want out of AI, and 8:34if you're able to take advantage of it 8:37and build the way you want to build, and 8:39not everybody builds with code, some 8:40people build with words, some people 8:42build with math, etc. But you have to 8:44decide what you care about, and you have 8:46to latch on to that stream, and you have 8:48to follow it, and you have to take it 8:50seriously, and you have to upgrade your 8:51tool sets a lot. The learning curve is 8:54going to be real because we're all going 8:55through this exponential curve together. 8:57Don't get fooled by everyone else saying 9:00it's just another Monday. It's not just 9:02another Monday. The news is going to 9:04keep coming. There will be more big 9:06releases even this week, but this was a 9:08big deal. And I hope you have fun 9:10building something with Codeex. I hope 9:12someone who has built code before, I 9:14really hope that this whole promise of 9:17codeex being better, which they they did 9:19quantitative analysis, right? It's not 9:20that they're just promising. you're 9:21actually looking at pull requests, etc. 9:23But I really hope that this actually 9:25bears out for all of us because it would 9:27be really nice to have an AI that does 9:29not have this obsession with refactoring 9:32the entire codebase at the drop of a 9:33hat. It would be nice to have surgical 9:35edits. And so here's to surgical edits. 9:38Here's to exponential change. Here's to 9:41seeing through the hype and recognizing 9:44it's not hype to get bored by. It really 9:45is a big deal. It's not just another 9:47Monday. Have fun with Codex.