Learning Library

← Back to Library

Codeex: Revolutionizing OpenAI Workflows

1h 9m • Unknown Channel • ai-ml • interview • intermediate • Watch on YouTube ↗

Key Points

The interview with Codeex engineering lead Tibo and design engineer Ed explores how Codeex functions as a “teammate,” reshaping everyday workflows at OpenAI for both technical and non‑technical staff.
Ed, a designer with a robotics background, joined OpenAI a year ago after a stint at Google, while Tibo came from Google → DeepMind and arrived about 1.5 years ago, initially building research tooling before pivoting to product‑focused infrastructure for AI models.
Their paths converged earlier this year, merging separate efforts into the unified Codeex project that aims to make OpenAI an AI‑native organization.
The conversation highlights that Codeex’s impact goes beyond code generation—it changes collaboration, ideation, and execution across the entire company, democratizing AI assistance for all teams.

Sections

Full Transcript

# Codeex: Revolutionizing OpenAI Workflows **Source:** [https://www.youtube.com/watch?v=tuLWIK1AVEM](https://www.youtube.com/watch?v=tuLWIK1AVEM) **Duration:** 01:09:18 ## Summary - The interview with Codeex engineering lead Tibo and design engineer Ed explores how Codeex functions as a “teammate,” reshaping everyday workflows at OpenAI for both technical and non‑technical staff. - Ed, a designer with a robotics background, joined OpenAI a year ago after a stint at Google, while Tibo came from Google → DeepMind and arrived about 1.5 years ago, initially building research tooling before pivoting to product‑focused infrastructure for AI models. - Their paths converged earlier this year, merging separate efforts into the unified Codeex project that aims to make OpenAI an AI‑native organization. - The conversation highlights that Codeex’s impact goes beyond code generation—it changes collaboration, ideation, and execution across the entire company, democratizing AI assistance for all teams. ## Sections - [00:00:00](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=0s) **AI‑Native Workflows with Codeex** - In this interview, engineering lead Tibo and designer Ed explain how Codeex functions as a teammate at OpenAI, reshaping daily workflows and collaboration between technical and non‑technical staff. - [00:03:08](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=188s) **OpenAI Engineers' Daily CodeX Use** - The speaker explains that OpenAI staff engage with CodeX primarily through mandatory code reviews and casual usage by non‑technical personnel, while power users run increasingly complex, high‑compute, multi‑agent workflows, illustrating the tool’s broad and evolving adoption. - [00:07:33](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=453s) **Balancing Automated Code Review** - The speakers discuss how an automated code‑review tool offers easy access, enforces healthy review constraints, optimizes signal‑to‑noise, and continuously improves to provide a valuable, always‑on safety net for developers. - [00:11:34](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=694s) **Evolving Coding Agents as General Assistants** - The speaker reflects on how rigorously trained models produce clear, cogent results, increasingly using code merely as a tool to generate trustworthy, step‑by‑step explanations, and how such coding agents are evolving into broader, more versatile assistants. - [00:14:58](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=898s) **Beyond Titles: Skills Over Roles** - The speakers discuss how modern work is moving away from fixed job titles toward flexible skill sets focused on problem‑solving, rapid iteration, and inexpensive ideation, creating both excitement and uncertainty. - [00:18:20](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=1100s) **Co-evolving with AI Code Assistants** - Participants observe that developers rapidly adopt and adapt to transformative tools like Codex and reasoning models, emphasizing human flexibility and the outsized impact of small, agile teams in harnessing these step‑change technologies. - [00:21:43](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=1303s) **Junior Engineers Thriving with AI** - Despite a prevailing belief that AI tools favor experienced staff, OpenAI’s junior hires are excelling, bringing fresh perspectives, adaptability, and renewed enthusiasm to the team. - [00:25:38](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=1538s) **Curiosity Drives AI Adoption** - The speaker argues that curiosity and a proactive mindset are essential now that AI models are rapidly advancing, as those who embrace the technology as a problem‑solving tool will thrive in the evolving software engineering landscape. - [00:28:44](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=1724s) **Balancing Breadth and Depth in AI** - The speakers discuss how powerful AI tools enable tackling numerous lower‑urgency problems while stressing the need to prioritize high‑impact directions based on model improvements and user demand. - [00:33:16](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=1996s) **Shifting Burden from Code Generation to Review** - The speakers discuss how AI coding agents move the workload from writing code to reviewing generated code, emphasizing the need for smooth interfaces and better code‑review tooling. - [00:36:20](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=2180s) **Maintaining Engineer Fluency Post‑Automation** - The conversation explores the challenge of keeping developers’ code‑reading, planning, and deployment skills sharp—and teams productive—as AI‑generated code becomes routine in a fast‑moving software environment. - [00:40:21](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=2421s) **Model Tool Integration Magic** - The speaker marvels at a model’s ability to generate visual assets and act as a coding agent, emphasizing that giving AI access to a rich toolbox—such as a Unix shell—unleashes powerful, combinatorial capabilities. - [00:43:48](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=2628s) **Code-as-Tool and Agent Memory** - Discusses treating generated code as a disposable tool rather than reviewed output and explores challenges of memory management in long‑running autonomous agents. - [00:47:07](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=2827s) **Simplicity, Careers, and Model Evolution** - The speaker observes that surprisingly simple AI primitives continue to excel as capabilities rapidly advance, urging a minimalist approach to avoid escalating complexity, and highlights a shift in career thinking toward problem‑focused, fluid roles that co‑evolve with advancing models. - [00:50:13](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=3013s) **Evolving Hiring Practices at OpenAI** - OpenAI staff explain the shift from strict programming tests to broader skill assessments, the challenge of spotting diverse talent, and the necessity to redesign interview processes as candidates increasingly leverage AI tools like ChatGPT. - [00:53:48](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=3228s) **Conveying New LLM Improvements** - A panel discusses how to effectively communicate the real step‑change capabilities of updated models like ChatGPT 5.2 to users who see the interface as unchanged and view chatbot use as saturated. - [00:59:44](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=3584s) **Everyday Use of Coding Agents** - The speaker explains how they employ ChatGPT for brainstorming, structuring, and research while relying on CodeX for concise, focused answers in their daily workflow. - [01:03:17](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=3797s) **Evolving Fine-Grained Model Map** - The speaker explains how they maintain a constantly updated, detailed mental inventory of dozens of AI models—matching each to very specific tasks such as reading handwritten tally marks—to avoid reliance on a single “perfect” model and stay adaptable as their needs evolve. - [01:07:25](https://www.youtube.com/watch?v=tuLWIK1AVEM&t=4045s) **Benchmark Saturation and Future Metrics** - The speakers explain how evaluation benchmarks quickly become saturated as models improve, prompting a continual cycle of new tests and speculation about far‑flung future metrics such as “running a multi‑billion‑dollar company.” ## Full Transcript

0:00So, a couple of days ago, I had the 0:02privilege of sitting down with two 0:04members of Codeex's engineering team. I 0:06got to talk with uh Tibo, who's pretty 0:08well known as an engineering lead at 0:10Codeex, and also with Ed, a design 0:12engineer. Our focus really isn't the 0:15code. So, if you're like not a 0:16developer, this is still going to be 0:18super interesting for you. Instead, what 0:20we focused on is how does codeex change 0:25how OpenAI works? And in particular, 0:28when you're talking to someone from a 0:29non-technical background like Ed and a 0:31technical background like Tibo, how do 0:33our workflows shift? How does what we 0:35build change when you have codecs as 0:40effectively a teammate? What does that 0:42look like in practice? I think we often 0:44talk about AI native organizations, but 0:46I wanted to take this chance to sit down 0:47with a truly AI native organization with 0:49OpenAI and actually learn how they use 0:53codeex day-to-day and how it's changing 0:54everybody's workflows, not just the 0:57technical team. So, jump in. This is 0:59going to be a fun one. Well, maybe first 1:00like I'd love to hear a little bit from 1:03you guys about who you are, how you came 1:05to open AI. Uh I know that everyone has 1:09their own story here and I'd love to 1:10hear a little bit about yours. 1:13Yeah, nice story, Ed. 1:15>> Yeah, I'm a designer on Codeex. Um, been 1:17in Open AI just over a year. I've been 1:19on Codex for about six months. Before 1:21that, worked worked in the research 1:22team. And yeah, I've always worked at 1:23the intersection of design, design, 1:25engineering, and research. Worked on 1:26robotics before at Google and a few 1:28other things before that. 1:30>> Yeah, I uh worked at Google as well uh 1:33finally like I didn't know we shared 1:35that piece of history. Always figuring 1:37things out. So at Google very briefly, 1:39moved into uh Deep Mind, worked there 1:41for many many years. 1:43>> Mh. 1:44>> And then uh decided to like do the big 1:47jump and like go to the US and come here 1:49and work for OpenAI. That was like about 1:51a year and a half ago. Uh that was 1:53pre-reasoning like a year and a half 1:55ago. That was pre 1:57>> Yes. And so in typical OpenAI fashion 2:01joined just before uh that happened like 2:04was part of like the 01 sprint. I was 2:06more like trying to be useful in any way 2:09possible and then uh after that like 2:11kind of stuck around built some tooling 2:13for research became super obsessed late 2:15last year around uh hey like maybe 2:18models are going to continue to improve 2:19and their capabilities are you know 2:21going to continue to uh impress us and 2:23maybe actually like we should think more 2:25about the products the infrastructure 2:27around it to really benefit from those 2:29models and then uh started like 2:31prototyping you were working on similar 2:33things. 2:33>> Yeah. Uh and we we we were not working 2:35together initially and then we joined 2:37efforts 2:38>> earlier this year. 2:39>> Yeah. Yeah. 2:39>> And that's that was sort of like how 2:40Codex got started. 2:42>> Yeah. 2:43>> So you guys were there from the 2:44beginning with Codeex. 2:46>> Yeah. 2:46>> Yeah. I mean it's had a long history, 2:48right? Um right. Coding agents had openi 2:51you know the name Codex is a throwback 2:52to a model which was like you know 2:55prejudg I believe right. So coding 2:57agents have been around but the yeah 2:58codeex as it is now is 3:00>> yeah codeex is the the product uh that 3:03was like released what in April this 3:05year. 3:06>> Yeah. 3:08>> Amazing. So one of the things I I'll 3:11just ask you what I get asked about 3:13codeex because we get to chat and we get 3:15to find out. Uh this is like the number 3:17one question I get asked is how do 3:19engineers at OpenAI use codeex daytoday? 3:24again two different patterns. Um it's 3:26like one is everyone just doesn't have a 3:31choice on like the code is reviewed by 3:32codeex uh no no matter like whether 3:35whether you want it you know reviewed or 3:37not it's just been like so useful at 3:39catching issues and then there's a lot 3:42of um casual usage uh by you know even 3:45like nontechnical staff and then what 3:47we're also seeing is like at the 3:49complete end of the spectrum is like 3:51really power users of codecs that deploy 3:54a lot of compute a lot more than know we 3:57saw even like a couple of months ago and 3:59this continues to increase and increase 4:01and increase and um with increasingly uh 4:04complex workflows some of them multi- 4:07aent you know running like for many many 4:10hours 4:11>> and so it's a highly personal thing uh 4:14still feel like it's very evolving 4:17>> that makes a lot of sense yeah go ahead 4:20>> yeah so as I say you know I'm a designer 4:22on the team so work very closely with 4:24engineers But I'm very much in the 4:26codebase a lot myself. And you know, I 4:29think the cool thing about Codeex and 4:31these recent models over the past few 4:33months is they really have been a step 4:35change. And what you've seen, I think 4:37even since we launched our most recent 4:38product suite is 4:41basically everyone at Open using it. You 4:43know, I like there's one engineer that I 4:45know who uses it for for everything for 4:46note takingaking for like it's basically 4:48his like primary interface to his 4:50computer. Um, as a designer, you know, 4:53I'm seeing more and more in our like 4:55work in progress channel on Slack, 4:56people posting these demos. And, you 4:59know, I DM someone, I was like, I didn't 5:00I didn't need a code. And he's like, I I 5:02couldn't till a few months ago. 5:03>> Um, so you've got kind of, you know, 5:05design engineers like myself hopping in 5:06more, submitting more PRs and kind of 5:08getting closer to the, you know, closer 5:10to the details. And then even even new 5:12people, nontechnical people, go to 5:14market stuff even, you know, people are 5:15really like hopping in and it's just 5:17like this kind of force multiplier. 5:19Yeah, that that's exactly where I kind 5:22of wanted to chat because I think that 5:23for a lot of organizations that remains 5:25the dream, but maybe it's something 5:28about the command line uh in the 5:30terminal sort of scariness that comes 5:32with that. Uh but for whatever reason 5:34people find themselves 5:36sort of hard limiting a lot of these 5:39technical tools to engineering teams. Uh 5:41sometimes that is literally at the level 5:43of the IT policy. have been in 5:44organizations where the IT policy only 5:49allows engineers to use tools like this 5:52and if they catch you doing this as a 5:53non-technical person that's a violation 5:55of policy and I think some of these 5:56older ways of working and thinking are 5:58having to evolve. 5:59>> Yeah. I think the lines are what we're 6:01seeing is like the lines are blurring 6:03like Ed I mean you're sort of like 6:04everywhere like ideulating about the 6:07future but then also like very much like 6:09using 6:10>> codecs every day and then feeling you 6:12know does it feel right like pulling up 6:14like you know PRs and little fixes 6:16>> there's like how you know that's so 6:17evolved very quickly right 6:19>> totally yeah yeah and I think yeah you 6:23know to your point there's there's 6:24there's kind of one half of it which is 6:26like how do you how do you bring 6:27organizations along and you have you 6:29know particular some of these large 6:30organizations might have some more like 6:31institutional challenges but um but you 6:34know once you get access I feel like 6:36it's kind of getting easier as easy as 6:38possible um so you know you mentioned 6:40that it might feel like a bit of a step 6:43up for people to get in the terminal I 6:44think the cool thing with some of our 6:45products recently is you know we've 6:47shipped an ID extension so we're not 6:49just in the terminal we have a CLI 6:51product which we've had for a little bit 6:52>> but you know we could we meet people 6:54where they code so they might be in VS 6:55code they might be in cursor these other 6:57kind of idees um and we also have a web 6:59product. So you know you can uh you know 7:02once you kind of connect all the 7:03enterprise uh you know uh kind of puzzle 7:06pieces you can just go in a web product 7:09type in a prompt and create a fix. So 7:10for example, you know, say you're you 7:12want to change some UX copy, you're a 7:14copywriter, you know, maybe you don't 7:15even need to look at the code, you just 7:16want to change some strings, you can 7:18just do that yourself, right? You can go 7:20in and you can type this prompt if your 7:21kind of enterprise is set up. So um 7:23yeah, I think like the number of 7:25surfaces that people working across and 7:27just like yeah, just makes it kind of 7:28easy and easier to get involved. 7:31>> Yeah, I I think that there's that that 7:33ease of access piece you guys have done 7:35a nice job solving for over the last few 7:36months. I think the other piece that I 7:38heard as you were talking is that 7:40there's a little bit of a healthy 7:43constraint in something like having 7:45codeex review every PR like doesn't 7:47matter it's getting reviewed you have to 7:49engage with it and I I think that's also 7:52going to be new for a lot of 7:53organizations I talk with 7:56>> and the thing we're we've been very 7:58careful about is also optimizing for 8:00like signal to noise ratio and making 8:02sure that the hit rate is very good 8:04>> so that people don't actually complain 8:06and like you want to turn it off. 8:08>> And overall like as an organization 8:10we're getting like way more value out of 8:12it than you know potentially sometimes 8:13the misses 8:14>> and then we keep like improving you know 8:16the system and the model over time so 8:18that it's capable of finding more and 8:21more gnarly and like subtle issues over 8:24time. Uh and people are generally 8:26impressed. It's like I hear it all the 8:28time like it's like oh this thing is 8:29super human. It's like it's doing 8:30reviews I would never have done because 8:33I don't have the time to dig like four 8:35layers deep into the stack 8:36>> and just having that always on you don't 8:38have to think about it. You have that 8:40safety net that's just there. 8:41>> Yeah. Super interesting. Like I think 8:43particularly something like code of view 8:45as a designer I was thinking through the 8:46user experience right it's like oh no is 8:48everyone's just going to get loads of 8:49emails and um you know it turns out that 8:52like it's like one of the most loved 8:54features that I think we've shipped and 8:55that the one of the things that changed 8:56for me was seeing some of our top 8:58contributors across OpenAI not just our 8:59team you know commenting in in our Slack 9:01saying like this is as you say super 9:03human um and you know I look forward to 9:06like those notifications now like it 9:07really just like it just adds so much 9:09value 9:10>> like I think there are two things that 9:11are emerging right So like this ambient 9:13intelligence and code review is like one 9:15example of that where it just happens. 9:17You don't have to trigger it. You don't 9:18have to think about it. Uh and you just 9:20benefit from that intelligence being 9:22deployed. Then the other thing is people 9:24starting to use it as like a little 9:25assistant in their computer. Like it's 9:28not really about code. It's like it does 9:29like you know CIS admin task you know 9:31pulls like context and you know maybe 9:33the latest news for you and 9:36>> um it's just like or craft like some new 9:38designs and new ideas. Um and then like 9:41for that the current way that we're 9:43doing it in the CLI and the extension 9:45it's like you're instant 9:48uh and so like the current interfaces 9:50are maybe like holding things back a 9:51little bit. 9:53>> Yeah. It's kind it's kind of like it's 9:55kind of both ways really like in in some 9:56ways you know there's this throwback to 9:58the terminal that that people are 10:00getting nostalgic about and you know 10:02from a design perspective there are 10:03these kind of two counts once is is one 10:05is this it's this kind of parlor trick 10:07it's this like transitory you know kind 10:09of like form factor and actually you 10:11know there's this like perhaps there are 10:13some like new interaction paradigms that 10:14we're like pushing towards but perhaps 10:16aren't there yet. On the other hand, I 10:17think like the constraint of the promp 10:20box um you know the terminal like it 10:23it's kind of perfect as well for a way 10:24right it meets you where you are and 10:26it's very cool to see the workflows that 10:28people have built around that. So you 10:29can literally you can just spin up your 10:31terminal and you can yeah as you say 10:33right you can write notes you can do all 10:35of these different things from just like 10:36s such a simple form factor. Yeah, I I 10:40think one of the things that has 10:41surprised me like I if I go back to that 10:43idea of a non-technical use case for 10:45codecs, uh I find that codeex is an 10:50extraordinarily 10:52logical model and when I'm using it for 10:56a non-technical use case, 10:59>> I find that there's a sharpness and a 11:03conciseness about how it evaluates a 11:06particular set of inputs that feels like 11:08I I can see where it came from. It feels 11:09like of course you would get this from a 11:11a model designed for code, but it turns 11:14out that there's a there's an 11:16extensibility to that to that emergent 11:19property that helps with a lot of other 11:21things. And so like I did a business 11:23case analysis and it wasn't technical, 11:26right? Like you're analyzing business 11:27inputs like revenue and you're analyzing 11:29uh sales figures, etc. 11:32>> But it applies that same rigor and it 11:34turns out that you get a response that's 11:37really coherent. It's really clear. It's 11:40cogent. It makes sense. It's easy to 11:42read. And it's as a result very very 11:45useful. And I just I love the idea that 11:49these models end up having extensible 11:51properties that perhaps spin off of what 11:54they were originally designed for and 11:56allow us to do lots of other things. 11:58There's that element of hey this model 12:01is trained to be like precise 12:04uh and and correct about things and 12:06diligent and you know double check maybe 12:08trip trouble check its work sometimes 12:09and not do all the math in its head 12:12right or in its context and maybe write 12:15a little Python script to help itself 12:16out. I use it for data analysis like all 12:19the time. Uh, and it's not about the 12:21code anymore, right? It's it's about the 12:23result and like trusting the steps and 12:25like as you as you put it, 12:27>> it's like a very cogent like legible 12:29explanation. You can like see step by 12:31step like what, you know, why it's doing 12:33things? Uh, and then there's a question 12:35of like at what point, you know, for 12:37these kinds of tax, do you still need to 12:39look at the code? Is the code just a 12:40tool that you don't really care about? 12:42And then you're using that as a stepping 12:44stone. So then then you have like a 12:46coding agent that's maybe evolving into 12:47like you know a more general kind of 12:49assistant. I guess that's an interesting 12:51thing to think about. 12:53>> Yeah. Yeah. In terms of the use you know 12:55you mentioned about kind of like you 12:57know design showing up in different 12:58places. Um and I you know I think the 13:01same way about some of the use cases for 13:03designers. So like on the one hand 13:04there's like fixing the paper cuts of 13:07you know cuz we're in these tools all 13:08day every day like literally eight hours 13:10a day. You know any small any small 13:12paper cut that you get you just see and 13:13you can fix it. And obviously, you know, 13:15you're submitting a PR, it needs to, you 13:17know, you need to look at the code that 13:19you're generating and, you know, we can 13:21go through the review process. But if 13:22I'm in like a very different mindset and 13:24I'm in like design, you know, idiating 13:26mindset like, yeah, maybe I can just 13:28make my terminal really small, don't 13:30worry too much about the code, just have 13:32a local host open and basically, you 13:35know, just like narrow this gap from 13:36like kind of, you know, thought to to 13:38product. Um, and really just focus on 13:40the interactions. you know, you can move 13:42it, you can think about responsiveness 13:43and and that kind of stuff becomes more 13:45important and it becomes more like a 13:46canvas. So, you know, similar to if 13:48you're writing, very different use case, 13:50but you know, also like a very different 13:52way of designing. 13:54>> Yeah. Because for so long design has 13:56been effectively disintermediated from 13:59engineering and like coming from a 14:01product perspective, so much of the role 14:03traditionally was translate design into 14:06something that has requirements that 14:07engineers can build against. And so 14:09there was always this tension between 14:11PMs and engineers and designers when I 14:13was coming up where it's like everyone 14:15has different incentives. Everyone and 14:17really it's all just a function of 14:18disintermediation. Like if you take away 14:21the gap and you give everyone access to 14:23the code, it's a different world. 14:25>> Yeah. Like often it's like hey Ed it's 14:28like you're just like an engineer on the 14:30team right like writing PRs and just 14:32fixing things. You don't need to go and 14:34talk to anyone. You just do it. 14:37>> Yeah. Yeah. And I think some of these 14:39boundaries as well as you say they're 14:40kind of slightly artificial. Um you know 14:42they've grown up um you know first we 14:45had the terminal and then with you know 14:47Mac we then were like started to think 14:49about the guey and these kind of new new 14:51disciplines emerged and they're kind of 14:53just converging and and and diverging 14:55over time. Yeah. 14:56>> You don't like being an engineer? 14:58>> Oh no no I mean I I yeah I don't know 15:01what I call myself now. I think that's 15:02the cool thing. 15:03>> There's an identity crisis where it's 15:05just like what am I? 15:07Yeah, I think that we're getting into a 15:10world where job titles matter less and 15:12skill sets matter more. Um, and it's 15:15really it's exciting to see what happens 15:18when people can wear those hats lightly 15:19and just focus on what problems they can 15:21solve. 15:22>> Yeah, it's it's it's really a lot it 15:25brings a lot of clarity. It's like it's 15:27all about the problems, 15:29>> figuring out what problems to solve, 15:31figuring out what questions to ask 15:32yourself. so much more is possible and 15:35it's much more cheaper to ideate and 15:37build and then you find yourself like 15:41being like, "Wow, you know, I really 15:43need to be crisp about what I want to go 15:44and do." 15:45>> Yeah. 15:46>> Um 15:48it's exciting, but it's also 15:49nerve-wracking at the same time, right? 15:51>> Yeah. 15:52>> Good ideas matter more. 15:53>> They do. And correctly aimed ideas 15:55matter more, I think. 15:58>> Yeah. The speed and velocity, right? 16:00like which direction you going in, 16:01>> how fast you learn, like you know those 16:03most successful like teams are that we 16:06see emerge at open air like really small 16:08teams that set themselves up as well to 16:10like learn and iterate super fast. Uh 16:13and then there's like a general sense of 16:14like oh we're building towards this uh 16:17but then like also changing things like 16:20is cheaper. Yeah, it's I mean there's 16:22the phrase right which has gone around 16:24in engineering for a long time which is 16:25kind of code wins and you know you can 16:27write as many PDs as you like but until 16:29you have the product in your hands and I 16:31think that's the very the cool thing 16:32that I've seen from like product team is 16:34broadly defined whether it's you know 16:35designers engineer it's like you know if 16:38there's a hackathon like at the end of a 16:40hackathon like you'll have like a fully 16:41working product like you won't just have 16:43like a kind of you know throwaway react 16:45demo like you had before which I think 16:46is like super exciting and then yeah 16:48this you know the hard decision is then 16:49what do you build like you know which 16:51which you know which company do you do 16:53Gina nar down? 16:54>> Yeah. Often times I mean you come up 16:55with like a new idea feature like an 16:57entire product and 17:00I have like to do it like a double take 17:02because it's like it's not like a static 17:04thing. It's just I'm like this thing is 17:05fully functional like it's almost 17:08chippable like how did you cook this? 17:12>> Yeah. Yeah, the cool thing I think about 17:13um this is I think this is another fun 17:16angle which probably hasn't been 17:17explored that much a little bit in kind 17:18of like design engineering which is like 17:21you know the kind of old world is you're 17:23a designer you work in or or you know 17:26product manager for that matter you work 17:27in a document you work in a a file and 17:29it's this like throwaway piece and then 17:31you throw it to the engineer and the 17:32engineer kind of productionize it right 17:34but now like you know some of the demos 17:36that that Tibo was mentioning um I'll 17:38just create a fork of the repo and it's 17:40like it's not just demo. It's like a 17:42fully functioning thing and like 17:44obviously to move fast I've you know 17:46taken some shortcuts and like there 17:48there's going to be a you know it's 17:49going to be a little rough around the 17:50edges but um but yeah the fidelity that 17:52you can get to is is amazing. 17:55>> Yeah, there's one of the pieces you just 17:57used as a throwaway line that I want to 17:59dig into a little bit. You talked about 18:01this idea that you like come up with an 18:03idea and then a team forms around the 18:04idea to get that idea to fruition. And I 18:07feel like that's a little bit the story 18:08of Codeex, but I think it's also an a 18:11story of a new way of working. And so 18:13I'd be curious for you guys to share a 18:15little bit more about what that feels 18:16like on the inside. 18:20>> It feels 18:22like we are co-evolving Codeex and the 18:26way of working. 18:28And so Codex is evolving as fast as we 18:31have to adapt to, you know, the new 18:33possibilities that it creates. And it's 18:35it's it's quite a challenge like but 18:37fortunately one thing that's become very 18:40clear is like humans are still like the 18:41very best at adapting quickly to things. 18:44>> Uh and isn't it quite insane to think 18:47like earlier this year none of this 18:49really existed. Um 18:51>> and like now like I it's very rare to 18:54find someone who still codes like 18:56without like a little agent by their 18:58side. So I think you know we're going to 19:00continue to see that. It's pretty clear 19:02to me small nimble teams tend to like 19:06produce incredible results and I think 19:08that's going to continue to be true. 19:10>> Yeah. No, I'd agree. I think Yeah. I 19:12think that the really interesting 19:14observation that I've seen is just I've 19:16been so surprised at how fast people get 19:19used to used to these new step changes. 19:22you know to Z's point I also joined like 19:24just before our reasoning models and I 19:25remember at the time we were talking 19:26about it's like ah that you know going 19:28to ship this reasoning model it's it's 19:30obviously you know the kind of sits on 19:32top of all of this research and it's 19:34been this you know really huge research 19:35project um you know for the company but 19:38again it was this lowkey research 19:39preview and it just it was such a step 19:42change um you know in in so many areas 19:45and if you think about where we are now 19:47um and you know I just look at how fast 19:50the different teams I've worked in over 19:52just the past 6 months. And it does just 19:54feel that kind of every few weeks or 19:56every model release, you know, we kind 19:59of like push this frontier even more and 20:01then a week later, you know, you'll be 20:04in some agent loop and you'll get a bug 20:05and you'll be like ah this model, you 20:06know, like getting frustrated and then 20:08you're like you forget forget that it's 20:10like, you know, this is insane. So it's 20:13just it's one of those, you know, it's 20:14just like, you know, you could apply the 20:15same to image generation or video, 20:18right? you know, we shipped Sora and 20:19it's like mind-blowing and then you see 20:20this tiny, you know, this tiny fragment 20:22and you're kind of like, oh, you know, 20:24but you forget, you know, you zoom out 20:26and you just become used to these things 20:28so fast. But super point, I think the 20:29cool thing is it's just super empowering 20:31for small teams. Um and you know even 20:34some of the junior engineers that that 20:36that that we work with who you know 20:38perhaps are only a few years out of 20:39university you know the kind of the 20:41breadth of work that they can that they 20:43can do and the kind of big swings that 20:44they can take you know I think just even 20:46within the past few years has really 20:47kind of accelerated their work as well. 20:49Yeah, we've got Ahmed on the team. He 20:51joined as a new grad. 20:52>> Didn't know Rust, learned Rust super 20:55quickly. I've never really seen one like 20:57pick up like a new language like as fast 20:59as that and like, you know, get 21:00productive. And then it's the way that 21:02he manages to 21:05uh like accept the technology and the 21:09potential and like discover the true 21:11potential of agents is I think faster 21:12than most people on the team. Uh and so 21:15there's sort of like that superpower of 21:16like how quickly are you willing to try 21:19and adopt new ways of working as well. 21:22Um I've seen also like veterans like you 21:24know 10 years in the industry like you 21:25know kind of stick to their um you know 21:27more traditional ways of like developing 21:29and uh it's tough. I'm not sure which 21:32one you know is more effective but it's 21:34pretty clear to me like you know jump 21:36like three months six months from now 21:38it's like you know it's going to be very 21:39clear which one is more effective. 21:41>> Yeah. I I think honestly it'll be a 21:43surprise to a lot of folks to hear that 21:45there are junior engineers at OpenAI 21:47simply because there's been a widespread 21:49perception over the last 12 months that 21:51these tools can dramatically accelerate 21:54people who know enough of the business 21:56context and have the experience to 21:57utilize them and then juniors coming in 22:00like I hear from juniors directly saying 22:01I you know can't for the life of me get 22:03a role anywhere uh because I don't have 22:06experience and now you also need to have 22:08the ability to have that experience and 22:09leverage it with AI and it's even harder 22:11and higher. 22:12>> But you you guys have juniors and 22:13they're apparently doing very well. So, 22:15what what's that been like? 22:17>> It's been awesome. Uh I think it brings 22:20such a joy uh to to work as well and 22:24like fresh perspective and then keeps 22:26like us grounded. Um 22:29>> and I have been delightfully surprised 22:32uh about like you know how well that's 22:34been working for us. Uh and it's changed 22:38my perception as well of like you know 22:40what is important like know this 22:41adaptability. 22:43Um, and then like a lot of it was as 22:46well like Ahmed, I'm just going to take 22:48Ahmed as an example, but uh, sorry 22:51Ahmed. 22:54Um, he he was sort of like he grew up 22:57almost with this. Uh so it's like he 23:01it's it's it's not quite true but you 23:03know it will be true at some point where 23:04it's like life before coding agents you 23:07know background ambient intelligence 23:09like you know having a little assistant 23:11like in your terminal like that was not 23:13a thing and so it's just supernatural to 23:16them like whereas like for me and others 23:19sometimes I'm just like oh you know it's 23:20like I'm going to go back to Vim and 23:22like you know just like that and like I 23:24don't necessarily like resort to like 23:27using it in the right Okay. And I'm like 23:30slowing myself down in a way. And then 23:33you you look at the way that you know 23:35they are using AI today and you get 23:37inspired. And so it's been actually 23:39really interesting of like how they've 23:41been able to level up the rest of the 23:43team uh who is like on paper more senior 23:46right 23:46>> and a combination that I've seen work 23:48very well is where we do spend a lot of 23:50time on the general architecture of a 23:53codebase. You know it's like the 23:55principle of software engineering still 23:57remain uh and then once you have the 24:00right scaffolding then you can just go 24:02and like and run and be extremely fast 24:03and proficient because the agent goals 24:06just like respect the general 24:07scaffolding and the boundaries that 24:09you've set. 24:11Reading between the lines a little bit, 24:14>> it sounds like the quality of character 24:19that is most important that you guys are 24:21seeing on the ground as you work with 24:23these models and to your point evolved 24:24teamwork with these models. I is it 24:27around openness to experience and 24:29learning new things and the ability to 24:31adapt quickly and that's something that 24:33really whether you're junior, whether 24:34you're senior, whether you're technical, 24:36whether you're not technical, that's 24:37what you got to have in the AI age or is 24:39there something else? 24:42I 24:42>> It's interesting. I mean, you know, I 24:44interview a lot of designers. It's 24:45they're definitely qualities that I that 24:47I look for um you know, when hiring. But 24:51I you know it's we I mean we're just 24:54we're we're going through a you know bit 24:57of a step change technology wise and I 24:59think it's just you know being open to 25:01those new ideas being open to using 25:03those new tools is you know is 25:05definitely helpful. Um you know if I 25:07think back to I'm a child of the the 25:09internet you know I kind of grew up pre- 25:10internet and post internet. I kind of 25:12feel like we're at the same, you know, 25:14the same point now for like software 25:15engineers, creatives, designers, kind 25:17of, you know, preAI, postAI. And I I'm 25:20seeing more and more people who are 25:22like, you know, maybe skeptical or like, 25:25you know, learning about a little as, 25:27you know, as TA says, kind of setting 25:28their ways perhaps they have their 25:30workflows, you know, dipping their toes 25:31in and just seeing the seeing the crazy 25:33benefits and and from there like moving 25:35forward. 25:38that I think curiosity and willingness 25:41to engage is the one most important 25:44thing right now. Uh and it's clear that 25:48we're only at the very beginning of you 25:50know what's going to continue to evolve 25:52like model capabilities are going to 25:54continue to increase like we are not 25:56seeing really a sign of a slowdown like 25:58the 5.2 2 that just came out. Uh, you 26:01know, it's a very strong model, but it's 26:04also, you know, one of, you know, many 26:07more to come. Uh, and like we have like 26:10a very clear research road map there 26:12that like a lot of the of the team and 26:14the rest of Open is excited about, but 26:16it's just to set 26:19the reality is like this will continue 26:22to revolutionize how we do software 26:23engineering. So if you're not 26:26if you're not willing to accept that 26:28you're it's going to be tough and like 26:30people who are like curious and are 26:31focused on solving problems and are out 26:33there in the world and are like hey just 26:35like how can I help people's lives you 26:37know how can I do this thing faster um 26:40you know they're the ones that are 26:42having a great time right now 26:44that's really been true uh the stories 26:46that I know that are positive hopeful 26:50exciting 26:52tend to be correlated really closely ly 26:55to people who have like a nose for 26:58interesting problems and who have a 27:00curiosity to solve them and just look at 27:02AI is this really cool super tool that 27:04they can use to solve those problems. 27:06Like I know uh someone who I think he 27:09started out as a music major in college 27:11and now he's a technical founder 27:15>> uh because he felt like being one, 27:16right? And so and now you can do that. 27:18And he just went and solved problems for 27:20customers. And I think that that's one 27:23of the things that I find most 27:26interesting about the trajectory that 27:28we're on is that those stories become 27:30more and more plausible, right? 27:36>> Yeah. I I think I think it's it's maybe 27:37one of the underrated or underrated 27:40parts of it in that I think there are a 27:42lot of you know really valid concerns 27:43that people have but I think the thing 27:45that people um you know don't focus on 27:47as much is it it is an equalizer like 27:49you know if I think of you know when I 27:51when I got into design to design as a 27:53kind of teenager you know I was I was 27:56doing a lot of animation I was kind of 27:57hand drawing things I was like doing a 27:59lot of creative stuff making movies with 28:00my friends in my you know in the garage 28:02you had to build a green screen you had 28:03to you know buy the camera which was 28:04expensive you had to get 28:06you now with like kind of $20 charge of 28:09DC subscription you can as a creative 28:11make like basically anything uh right 28:13you get access to codeex you get access 28:15to all these other things so in many 28:17ways it's an equalizer but it does 28:19require leaping in as you say kind of 28:21having that curiosity and yeah and 28:23really like throwing yourself in and 28:25learning all about it but you know if 28:26you're curious then yeah there's so much 28:28>> we still have complaints that usage 28:29limits and rate limits are too low but 28:31when you think about it like $20 a month 28:34for like a prolific software engineer 28:38that can help you get stuff done. It's 28:43>> crazy. 28:43>> Yeah. 28:44>> And like this equalizer thing. It's 28:47there's like so many problems that were 28:48not that were left unsolved before this 28:51and now that will got solved and that's 28:53that's what gets me excited. 28:55>> Yeah. I guess that brings me to another 28:58question I had. like you referenced 28:59earlier this idea that it's about 29:01choosing what you're going to focus on 29:04in this world because the tools are so 29:06powerful and I think that's definitely 29:07been something I've observed and then 29:09you just mentioned another big piece 29:10I've seen which is that there's a whole 29:12host of problems that uh for lack of a 29:15better term are SE 3 and SE 4 type 29:17problems that are now accessible and 29:20legible and solvable because we have a 29:23tool that can do them. And so on the one 29:24hand you have like more volume you can 29:27attack that's perhaps lower urgency and 29:29on the other hand you have a lot more 29:31value on picking your overall direction 29:33correctly. And I'd be curious to hear 29:34like in practice for you guys what is 29:37that balance like? How do you guys 29:38tackle sort of those two 29:42points on the on the scale? I think that 29:45two things that matter to us is general 29:47conviction that we have based on 29:49information you know around like hey our 29:51models are going to continue to improve 29:53along you know this set of capabilities 29:55like let's build ahead of that so that 29:56we continue to scale and like you know 29:58bring bring more benefit to our users 30:00and then the second part is like what 30:02are people asking for uh and there 30:05deploying intelligence like hender helps 30:07us as well like I was um was on Twitter 30:09the other day like just I started a 30:11thread like of like hey you know it's 30:13like what should we build like you know 30:15what's holding you back like what's not 30:17delightful on Codex right now and then 30:19got you know somewhere around like 250 30:21>> yeah I saw that thread yeah it was a 30:22good thread 30:24>> it it's like 600 like unique ideas but 30:27Codex helped me sift through it all and 30:30you know bring it back then and and uh 30:32based on my own priorities and my own 30:34notes uh like actually section it and 30:39then I was able to discuss it with the 30:40team but so yeah conviction and and 30:43feedback are like two good ways that we 30:45go about it. Do you have others? 30:48>> Yeah. No, I think that's I think that's 30:50that's a good framework. Um, you know, I 30:53think just to mention a few other areas 30:55where um, you know, where we've been 30:57building. So, we we we have this kind 30:58of, you know, CLI product, we have this 31:00web product and we have the ID 31:02extension. We also have some cool 31:04integrations. So, you know, you can add 31:06codecs in Slack and you can add codeex 31:08in linear. Um uh and with with a lot of 31:11these like smaller issues that you that 31:12you spoke about, one really cool trend 31:14that I've been seeing is um you know 31:16there are a lot of small tickets that 31:18you know the end of the year or the end 31:20of the course you kind of the team might 31:22struggle to get round to or they're like 31:23always there and they're kind of coming 31:24up at the end of the meetings and now 31:26right so like you know after you've 31:28triaged a bunch of these things you know 31:29maybe there are a bunch of small ones 31:30that you could just put into one of 31:32these um one of these integrations and 31:34you can just be like you know Codex fix 31:36this or you can literally assign it now 31:37in in in linear and other products. So I 31:40think like some of the small stuff it we 31:42are starting to get to these like really 31:44kind of endto-end workflows of you know 31:47tracking a small problem meaning like 31:49literally writing it down in you know 31:51some short descriptive way and then 31:53having a PR that you can review and 31:55choose choose to merge or not and I 31:57think like being able to free up a lot 31:59of time uh you know from focusing on a 32:02lot of that low-level work like just 32:03frees up pure resources and capacity to 32:05focus on some of those some of those big 32:07issues. So that's been a cool cool trend 32:09as well which you know obviously you 32:10always have to prioritize things you 32:12always have to you know filter signals 32:14through noise noise and and and make 32:15some kind of hard decisions but I do 32:17think that we've been able to to to to 32:19get a lot of that lowle work um kind of 32:22you know almost kind of enter and 32:23automated so that the team can like 32:25really focus on those big issues. I it 32:27also moves bottlenecks around right so 32:30like as we're solving almost solving 32:32code generation and you can implement 32:34any feature you know like faster and 32:36faster like suddenly you're left with 32:39deploying and maintaining the services 32:42and you know whenever like like your 32:44hardware breaks or like networking has 32:46an issue or like whatever like million 32:49things that can happen uh like now 32:51suddenly that you know you get paged a 32:53little bit more and like you're building 32:54sort of like ahead of the automation 32:56that we're able to uh deploy and the 32:58intelligence is not yet capable of like 33:00doing all these things like you know 33:01we're not able to yet have codeex deploy 33:04the service and like be on call um and 33:07this is like an area where currently 33:09we're feeling sort of like that load 33:11from having almost solved code 33:13generation. 33:14>> Yeah, I was going to go there so I'm 33:16glad you did because to me it's like 33:18>> you you've 100x extra code generation or 33:20whatever you want to use the multiple 33:21for but now you've just shifted all of 33:23that down the pipe. 33:25Yes. 33:27>> Yeah. I mean it opens up some cool 33:28interesting um you know interface 33:31possibilities like like if you think 33:32about chatbt right you're you're 33:34conversing back and forth with a model 33:35and you know you're asking for some 33:37piece of information and it kind of you 33:38know presents something back to you with 33:40a coding agent it's taking some action 33:42in the world and it's coming back you 33:43know most often in a codebase and the 33:46the kind of artifact and result of that 33:48is some code that you have to review if 33:51you if you want to do something useful 33:52with it. So yeah that at the moment I 33:55think we're in this kind of like 33:56transition period where the meme is that 33:58kind of like you know a lot of like 34:01software engineering is reviewing agent 34:03code. So I think like as a you know as 34:05an interface and as a problem to solve. 34:07I think that's a really interesting one 34:09to to think through. It's one that we're 34:11thinking through and it's one that I 34:12think you know many people in this in 34:13the industry are which is like how do 34:15you how do you not kind of shift the 34:17burden as you say from like um uh you 34:21know kind of like writing code to 34:22basically reviewing code and how can you 34:24make that as smooth as possible and you 34:25know I think we're doing some cool stuff 34:26in that space with with the code review 34:29um you know agents and other things but 34:30I think that's like one emerging uh you 34:33know emerging problem that that that 34:34we'll have to solve soon. One of the 34:36things that's special about code 34:37generation is like you can make it safe. 34:39Uh so you're going to have all the code 34:41generated in a sandbox. Yeah. 34:43>> Uh and you have no side effects and so 34:46therefore you know I think also like 34:47just because 34:49>> all the context is there it's textual 34:51like for code like you have you know you 34:54have g you have reboard like a lot of 34:55the automation already exists a lot of 34:57the tooling already exists. So it was 34:58solved first uh I think primarily due to 35:01a combination of reasons but that's a 35:03big one 35:04>> and you can make it safe like a lot of 35:06the work that we do is like we view 35:08coding agents under the lens of like you 35:10know safety and alignment and like 35:13alignment is not a solved problem which 35:15means that whenever you know you go into 35:18the world of like deployment and being 35:20on call and actually having like real 35:25world like consequences of an agent 35:28taking actions into the world. That is 35:30like a whole other game. Uh where you 35:32know you cannot make it yet. You know 35:35you cannot guarantee that the agent will 35:37not go and like delete your service or 35:41just like you know snoop at like you 35:43know user logs and there's a whole 35:45security aspect um and figuring out you 35:48know how to restrict the set of actions 35:50through like a safe space uh or you have 35:52to solve you know the alignment problem 35:54uh you know whichever will come first. 35:56we're sort of like inching towards that 35:57and finding more and more creative ways 35:59so that our agents can act upon the 36:01world safely and that you know you're 36:03able to steer and supervise that. I 36:06think that's like the next frontier of 36:08you know what we're going to unlock like 36:10you know in 2026 it's like good 36:12generation considered mostly solved good 36:13review we've been investing a lot in and 36:15then you know like where are the 36:16bottlenecks now? 36:18>> Yeah. Yeah. That's kind of where my head 36:20goes as I look at the next year as well. 36:22I think one of the things that people 36:24tend to get curious about and I think 36:26that will come up more and more as a 36:28conversation in 2026 36:30is how do engineers stay 36:34fluent and able to read code structures 36:36in ways that are meaningful in a world 36:38where code generation is to your point 36:40mostly a solved problem. How do we keep 36:42the fingertippy skills that are relevant 36:44so that you can understand what you're 36:47deploying? 36:49>> Yeah. So there's this part that we 36:51didn't discuss on like code 36:53understanding and planning as well like 36:56how quickly can you figure out like how 36:59your system is act functions today 37:02>> uh and then you know maybe use that 37:03knowledge in order to plan like your 37:05changes 37:06>> and then after you have your changes 37:08like how do you have them like you know 37:09actually deployed and have an effect 37:11upon the world like you know be it a 37:13product or or something else. Um yeah, 37:15that whole like um it's not just that 37:18but like I am more productive, you're 37:20more productive, everyone in the team is 37:22more productive and it's also keeping up 37:24with all of that. It's like you know 37:25what the hell is everyone doing? Uh it's 37:28like you know there are new features 37:29minted every day. It's like the world is 37:31changing so quickly around u just like 37:35in teams even small teams keeping up 37:37with it all is a challenge 37:38>> and you saying that I just want to be 37:40clear everyone's going to be like very 37:42discouraged to hear you say that. 37:44because we're all trying to keep up. 37:48We're building towards that, right? So, 37:50you want to have 37:51>> you want to have 37:54fast ways to like understand 37:56>> what's going on in the codebase? Uh like 37:59synthesize things. Is text the right way 38:01to do that? Do you want like a little 38:03report every day? Um you know, how fast 38:06should uh should your agent be in order 38:07to understand to help you understand the 38:09state of the code? But and to your point 38:12about you know kind of staying on top of 38:14um you know staying on top I guess kind 38:17of programming as well. So not kind of 38:19like delegating everything and and still 38:21deeply understanding things like um you 38:25know I I've seen some cool examples some 38:26people internally like occasionally kind 38:29of turn off their internet and they like 38:30I forget the term that they use but 38:31they're like basically kind of like you 38:33know old school coding and they're like 38:34there's no tab complete there's no agent 38:36you know there's no codecs next to them. 38:38Um and you know human curiosity doesn't 38:41go away. Like people still need to learn 38:43like you know the engineers on the team 38:45still read engineering books. I still 38:47read you know engineering books. So I I 38:49don't think that like curiosity is going 38:50away and I don't think like it'll become 38:52this kind of like you know thing that 38:54you hand it off to and you and you lose 38:56all the knowledge yourself. Um and as 38:58you say you know like models can can 39:00help you stay up to date as well. Like 39:01if you know if I'm trying to get to to 39:03know a code base um I can talk to the 39:05model about it. I can you know ask it 39:07like you know how does the back end 39:09integrate here like oh like where does 39:10this component come from can you just 39:12explain the dependencies like the model 39:14itself is also like a you know an 39:16amazing feature so um I think that's 39:17that's a cool angle as well 39:20>> when was the last time 39:23you were really surprised by an emergent 39:26property in a model 39:31this this morning. So, and 39:36>> um 39:38so yeah, I just saw someone build 39:43scaffolding around a model to enable it 39:45to work um on a problem that I thought 39:48was out of reach of the current 39:50capabilities of models uh and solve it, 39:53you know, successfully. And I was really 39:56surprised. I thought we would need to 39:58train the model specifically to be able 40:00to do this. Um but turns out like it 40:02generalized fairly well and worked um 40:07for like almost 13 hours on this one. Uh 40:10yeah, just just by being more creative 40:12on the tools and the setup around it. I 40:15hadn't seen this done before. So that 40:17was like really surprising to me. 40:19Yeah, I was gonna I mean like most days, 40:21you know, there's there there's some 40:22things that I, you know, probably can't 40:24say, but I think there's one one that 40:26actually stuck out to me which we, you 40:28know, we released it, but if you go in 40:29the web product um and you know, you you 40:33ask uh you ask the model a question and 40:35it can kind of send you some front end 40:36back. It like takes a photo of it and it 40:38like sends it back with it. 40:40>> And like when I first saw that, I 40:41thought that was magical, right? you 40:42know, you know, it's it's kind of using 40:44a bunch of tools, but there is something 40:46like very interesting about thinking 40:48about coding agents, you know, being 40:50able to code, but being able to see, 40:52being able to generate these assets and 40:53like at a conceptual level, I just found 40:55that really interesting as like a as a 40:57creative that, you know, this this model 41:00can like do so much more than I thought. 41:02>> Yeah. I I think that one of the things 41:04that's been my one of my biggest 41:06takeaways as I reflect on 2025 is that I 41:09probably 41:10as much as I was excited about tool use 41:12for models and I was I don't think I 41:15realized the combinatorial power that 41:17gets unlocked when you start to give a 41:20model a good set of tools. 41:25It's and and 41:28there is something there about like what 41:30is the a good set of tools and what the 41:33approach that we've taken with 41:37codeex is you know just give it access 41:39to your computer through like you know 41:41good old Unix tools right it's like give 41:44it a shell uh and let's see how far it 41:47can get by giving it a shell and then in 41:49order to do this safely like have it run 41:51in a sandbox 41:52>> okay 41:53>> and then what emerges from there um is 41:57to us like surprising because we don't 42:00necessarily care about like you know how 42:02the model is going to be able to achieve 42:04its task. 42:05>> Uh and so we don't necessarily have a 42:08specific 42:10bias there uh like other than like you 42:12know you should probably use the shell 42:14like a bunch of times. Uh but other than 42:17that it's like it's a very general tool 42:19and that's something that we've done 42:21consciously because we believe it's like 42:23one of the more scalable ways of doing 42:25things as it scales with like the model 42:27capabilities and it's super general. 42:30>> Yeah, I guess one of the 42:31>> go ahead surprising things as well on 42:34the creative side as well is like you 42:36know you know I mentioned that there's 42:37someone that that you know use it to 42:39write documents and things like that and 42:41it turns out that you don't need to give 42:42it a document writing tool. you can just 42:44use reg x and 42:45>> you know like through bash commands edit 42:48documents you know write kind of do 42:50anything. I think that was uh yeah like 42:54perhaps not surprising but but but 42:56pretty pretty amazing capability. 42:58>> Like the other day I was just like 43:01playing with something and then we had 43:02like a codeex SDK and then I just told 43:04Codex about it and then it was able to 43:07just write code and like use the SDK 43:09write a bunch of TypeScript and then 43:11invoke basically invoke itself in order 43:13to achieve more. um like we don't have 43:16native multi- aent in Codex but this is 43:19a form of it that is just completely 43:21emerging because I just read the 43:22documentation I was like oh you know I 43:24can probably get this tool to do 43:26something for me um and I just like 43:28wrote that code invoked it and it just 43:30worked um Codex is very good at like 43:33figuring out ways to solve its problems 43:35>> so codeex essentially wrote in read the 43:38SDK docs instantiated another codeex 43:41instance and used that as a tool to get 43:43a job done 43:44>> that's Right. A bunch of them actually. 43:46>> Yeah. 43:48>> Yeah. Effectively it it bootstrap multi- 43:51aent. 43:52>> Yes. Like without us like you know 43:55thinking about it. Yeah. 43:58>> And so there's this thing of like 43:59throwaway code is interesting to think 44:01about it right is like code as a tool. 44:03It's obviously extremely powerful. Um, 44:06but maybe you know there's just like 44:08this whole category of things where like 44:11the agent is just you know writing code 44:13as like it's not a piece of code that 44:15you should ever review as a human or 44:17like you know that you necessarily care 44:18about. It's just a very general tool. 44:22>> Yeah. It's it's code as a means, not 44:25code as an output. 44:26>> Yes. 44:28So sort of riffing off the tool piece, 44:31I've also seen sort of models with fewer 44:34but more powerful tools that are more 44:36general in nature doing better overall. 44:38So that doesn't surprise me. Um I'm 44:41going to the other side now looking at 44:43the memory side of things and how 44:44longunning agentic tasks handle memory 44:47problems both maybe sort of stateful 44:49memory you have outside the system and 44:51also in context memory management 44:54approaches. How do you guys think about 44:56that for like you know the 20our task or 44:58whatever that you're running? Um h how 45:00does memory work? 45:03So memory is still an open research 45:07topic like it's clear like something 45:09there will emerge that is better than 45:12you know whatever short-term approaches 45:15you know we're are taking like right now 45:17as a form of memories you know you can 45:20have the model like the model can write 45:21to a file and then you know keep track 45:24of like a lot of its state like you know 45:26through like just markdown files for 45:27example 45:28>> another thing that you know we're doing 45:30is like for very long running 45:33sessions like the model that that goes 45:36beyond its context window. And so the 45:39model is forced to like summarize like 45:42what it's achieved so far and then 45:46reboot itself uh through like a process 45:49that we call like compaction where at 45:52the end it's just like okay like let's 45:54just erase like all of the content of 45:56the the context window and then 45:59summarize it and then you know you 46:01reboot and then you restart and then you 46:03can do this many many times and 46:05essentially then you're able to have the 46:06model you know the agent work forever. 46:09Uh you know if the task like required to 46:11work forever it would work forever. 46:15In addition to that because it has 46:17access to just like grab and is able to 46:19like search through things it can also 46:22dump additional context that it doesn't 46:25really need to have always in its 46:27context window just to files and that's 46:29like a form of memory. uh with skills as 46:33well like you know you might have skills 46:34in a file somewhere and it's like a form 46:37of memory that is shared between the 46:39user and the agent 46:41>> and you're co-evolving some common 46:43knowledge there where it allows you to 46:45have an agent that you know performs 46:47hopefully better over time. There is a 46:50problem with staleness there where it's 46:52just like it's sort of like a poor 46:54version and a hacky version of memory 46:56and it does feel like this will get 46:58disrupted at some point but that's 47:00mostly how we've seen like you know it 47:02tackled and you know it's a very simple 47:03way of achieving it. 47:05Yeah, that's one of the themes that I 47:07think is emerging as we chat a little 47:09bit more is that you are seeing 47:12surprisingly simple primitives 47:16be surprisingly successful at solving 47:18larger generalized problems. 47:22Yeah, I think that's a thing that 47:25a lot of people in the field have 47:27learned a long time ago and I think it's 47:30sort of like being internalized and it's 47:32not necessarily like common general 47:35knowledge of like hey keeping things 47:37simple with models that are evolving in 47:39capabilities month after month after 47:41month is probably the right thing to do 47:42because otherwise you end up with a pile 47:44of complexity that you have to continue 47:46to adapt to the ever evolving 47:48capabilities. So that's why we decide to 47:50think to keep things very simple as 47:51well. 47:52>> That makes a that makes a ton of sense. 47:54Um the other the other big question I 47:57want to get to this gets back to the 47:58whole idea of like technical and 48:00non-technical folks using codecs. I get 48:02asked a lot how do I think about my 48:05career? How do I think about career 48:07progression in a world where job titles 48:09are increasingly sort of optional hats 48:11that you can take off and put on? Uh and 48:14it's about the problems you solve. What 48:16is the career conversation inside OpenAI 48:20and and how does how does co-evolving 48:22with the model shape that? 48:26>> Yeah. How do you feel about this? 48:27>> Yeah, it's a good question. I mean I 48:29think like you know I think one kind of 48:32emerging trend that I'm seeing um among 48:36designers and to an extent some 48:38engineers as well which I personally 48:40think is a positive um uh you know 48:43positive direction which is kind of what 48:44I spoke to earlier with this kind of 48:46idea of kind of equalization is there's 48:48less of a focus on kind of credentials 48:50and going through certain routes right 48:53to get to to to kind of you know ascend 48:55certain kind of peaks of credentials and 48:58more a kind of focus on kind of what 49:00you've done and what you can show and 49:02kind of like you know code wins. So 49:04particularly in the design community, 49:05you know, what I'm seeing is a lot of 49:07people are kind of building really 49:08exciting things, putting them out there 49:11and from a career perspective, you know, 49:13building up profiles and um you know, 49:17kind of like work through what they've 49:18done and what they show and no one cares 49:20where they went to school, you know, and 49:23all of those other things perhaps of the 49:24past, which sometimes good, you know, 49:26sometimes, you know, bad from like a 49:28from kind of a credential point of view. 49:30So I do think there is definitely a kind 49:32of like learning through doing and kind 49:34of you know kind of proving through you 49:36know what you've done. Um which I think 49:38is is an exciting uh trend and I you 49:41know I think also you know a lot of like 49:43creator economy and you know the rise of 49:45you know podcasts and personal media is 49:47is kind of similar right it's just like 49:48anyone can kind of do you know you can 49:50just do things is the internal phrase um 49:53but you can just kind of you know do 49:55things and show you and show your you 49:56know show your skills through that 49:58direction. So I think like that's 49:59personally one trend that I've seen um 50:01you know to broader broader trends you 50:03know I don't know if I have as much of a 50:04perspective there but 50:06>> yeah yeah I think you can just do things 50:08very much a mantra uh the second mantra 50:10we have is like I am also curious about 50:12this. 50:12>> Yeah. 50:13>> Uh this is that's sort of like the two 50:15things that people uh you know exhibit 50:17at OpenAI. I'd say it's less a challenge 50:20on um career progression at OpenAI. It's 50:23a you know we look at impact and that's 50:27sort of like you know how you progress. 50:28It's it's been a challenge on you know 50:30interviewing and finding the right 50:32people because the traits that you're 50:35looking for and you know like how you 50:38can succeed has sort of like broadened. 50:40Before it was like do you program well? 50:42It's like oh let's give you a series of 50:44like programming tasks and like you know 50:45really hard ones and you know we're 50:47going to pick the best talent there. But 50:49now it's not that easy anymore. It's 50:50like you you can actually be very 50:52successful, you know, if you're like not 50:55going to traditionally be like a top 50:57performer at like, you know, hard 50:59programming like tasks. And so like 51:01being able to um find the talent and be 51:04more creative here has been a challenge 51:06for us and like we're sort of like 51:08evolving our thinking there, but it's 51:09been very interesting. 51:11Do do you guys have a silver bullet for 51:14the persistent issue we see with 51:16interviewing where people will have chat 51:18GPT up on the side and we'll just be 51:20kind of reading responses back in the 51:22interview? 51:24>> Yeah, I mean we we bring people uh on 51:26site uh for for a lot of the interviews 51:30and then it's also just a a reality of 51:33like hey you know in the job you know 51:35you're going to use AI all the time. So 51:37it's more of a the interview itself 51:40needs to evolve and maybe not you know 51:42limiting like you know the tools that 51:44people can use. 51:45>> Yeah. I think it's like one way of 51:47thinking is like how are you using AI to 51:49get around some constraint and then 51:50another is thinking of it in an 51:52empowering way and you know you need to 51:53hit certain baselines. You need to um 51:56you know you need to to to to have 51:58certain skills in place but also kind of 51:59are you open to using tools and how can 52:02you understand how those those can give 52:03you leverage. 52:06Yeah, it reminds me a little bit of one 52:07of Jeff Bee's um favorite interview 52:10questions where he asks people um how do 52:14you solve uh a problem that has sort of 52:18two obvious solutions like I want I I 52:20want to uh have a higher quality car 52:22that goes faster, right? How do you do 52:24that? Uh and you have to pick which one 52:26to optimize for. And the trick is you're 52:28supposed to invent your way to both. Um 52:30you're supposed to think outside the 52:32box. You're supposed to think around the 52:33constraint. You're supposed to push on 52:34both. And that part of it is just 52:36measuring the willingness to break the 52:38mental box. 52:41>> Yeah. And 52:43is the point there that you know if you 52:45were using AI on the side like you know 52:47you wouldn't get you wouldn't come up 52:48with like a creative 52:50>> uh I don't know that it's necessarily 52:52that. I think sometimes I've seen that. 52:54I think 52:56>> the the best way I've seen to push 53:01people off script let's assume a remote, 53:02right? because I think on-sightes are 53:04critical, but we'll we'll take that as 53:05red. Um, if it's a remote interview, I 53:08think the most effective tool I've seen 53:10is really to push off the standard 53:12behavioral interview script pretty fast. 53:15>> Um, and push people into 53:19a really honest conversation that 53:22demands some higher level trade-off 53:24thinking and 53:26they don't really have time to feed it 53:29into a model and get a response in real 53:32time. and you see pretty quickly 53:35what their thinking tool set is in their 53:38head and that gives you a sense of like 53:40what they're going to bring as a partner 53:42with AI uh when when they start to work. 53:47>> We have this problem now, right? Like 53:48not sure if you're reading the questions 53:50off the screen on the right. 53:52>> I actually am not. I have no questions 53:54up. I'm staring at myself and you guys 53:56and I'm just kind of 53:58>> So are we. Sorry. Delightfully simple. 54:01Yeah, I think it's more fun that way. 54:03Like we get to kind of take the 54:04conversation where we want to go. Now, I 54:05did prep with chat GPT. I absolutely 54:08prepped with questions, but then I was 54:09like, ah, you know, they're okay and I'm 54:10just going to riff it. So, yeah. 54:13>> Yeah. 54:15>> Interesting. Uh, you know, another 54:17question we have. I know we only have, 54:18you know, a few minutes left here. Um, I 54:21think one of the things that we haven't 54:23dug into yet that I hear a lot is, and 54:29maybe this is like half a design 54:30question, half an engineering question. 54:32So, for the two of you together, like, 54:33it's going to be perfect. Um, 54:36I hear a lot of talk, especially when, 54:41um, I put out videos talking about new 54:43models. I'm going to do that for Chad 54:45GBT 5.2 here. People will say, "What's 54:49the difference? I see the same chatbot. 54:53>> I see the same terminal. 54:56I see a different label." Uh, how do I 54:59know that this is actually better? And 55:02like we've even had like comments from I 55:04think Sam and others that say chat is 55:06essentially a saturated use case. And I 55:07kind of agree like I think it's mostly 55:10saturated out. How do you convey the 55:14significant not significance is the 55:16wrong word the step change that you get 55:18in capabilities over a six month or year 55:21period to someone who is seeing the same 55:25UI. 55:25>> It's a good question. I mean I think 55:27like if it depends on the use case right 55:29like you know I think like what you know 55:33like as a designer or design engineer I 55:36kind of work with you know different 55:38models and then for some for some tasks 55:40I like one you know for others I like 55:42the other just like you know many other 55:44products um or if I'm in chatbt if I'm 55:47asking like a super code question right 55:48I'll just like leave it on auto or kind 55:50of you know some low reasoning model and 55:52if I'm really wanted to think maybe I'll 55:53use pro or something like that so I 55:55think it's one of these things where you 55:57kind of like you know you test it and 55:59and it depends on on the situation. Um 56:02you know that being said like we also we 56:05have a bunch of research evals you know 56:07so I think there are certain barometers 56:09that you can use but we've kind of 56:11talked through this interview about um 56:13you know the different capabilities that 56:15different model steps unlock and I think 56:17we have consistently seen that right 56:19like models today at least for coding 56:21are like substantively different from 56:22where they are when I joined this team. 56:24Um so it I think it's a case of just try 56:27them. Try try many different ones like 56:30you know see what you like. Some things 56:31are good for different use cases. Um but 56:34uh but um but yeah I I I think like 56:37maybe one good mental model perhaps that 56:39that people don't think about is you 56:42know when thinking about these things 56:44they think with a snapshot of where we 56:45are currently with how you interact with 56:47chriier models. And I think in five 56:49years time it's going to be very 56:50different right like if you think about 56:51what models these new capabilities can 56:54unlock different products will have very 56:56different experiences. You know you 56:58might you know is chat always the best 57:01um uh interface like you know you you 57:04might not be interacting with a model at 57:06all but it's still doing work for you in 57:08the background and in that case right 57:09model quality and and the model that 57:11you're using is very different. So yeah 57:13>> to come back to that where it's like 57:15that's code review is again an example 57:17of that where it just happens in the 57:19background the model improves and you 57:21know we know like the the either it got 57:24faster or it's able to spot more things 57:26it's like yay you just got an upgrade 57:29for free you don't have to think about 57:30it uh and you know you just benefit from 57:33it like every day and then codeex itself 57:36is a different product from chat uh 57:39agents benefit still like from a a lot 57:42of improvements on like we definitely 57:44see like whenever we improve on 57:46reliability like frontier intelligence 57:49how long it can it can go but then it 57:51feels like we will need another product 57:54at some point as well where you're not 57:57going to run codeex for like 3 days in 58:00your terminal um yeah maybe people will 58:03maybe people will right but it's like at 58:06what point do we have agents that just 58:09run forever 58:11>> uh in that you you interact with it's 58:12like that's going to feel like very 58:14different. Maybe you text it from time 58:15to time. Maybe you were going to call it 58:17and it's just like we yet have to invent 58:19like the right product around this. Like 58:20the models are not there yet but they 58:22will be. Uh 58:23>> and then that's going to be like oh wow 58:25you know it's just like we're like GPT7 58:27right? And it's like it'll be like 58:28obvious. Um and in the meantime it sort 58:32of feels like oh okay it's like it feels 58:34incremental at times. Um but then when 58:36you look back like six months ago you're 58:38like heck like none of this was 58:40possible. Exactly. Like I did a video 58:42this morning um about this idea that 58:46straight line extrapolation is 58:48surprisingly hard to experience in real 58:50time. Uh like you're sitting there and 58:53like like you said like people get used 58:55to the products so fast and people get 58:58disappointed and frustrated by the 58:59products. I I vividly remember uh how 59:02excited I was working with Chad GPT5 59:05thinking and how it felt like a step 59:08change and then I immediately within two 59:09days found a bunch of things that I 59:11didn't like about it and wanted to fix 59:12and like that's just how it goes because 59:14that's how humans sort of scale our 59:16taste I guess and I think one of the 59:19things that I've been thinking about is 59:22how do we 59:25take our human defaults where we seem to 59:27assume a static world and start to 59:30transition to a human default where we 59:32assume a dynamic world where we're 59:33living on the slopey part of an S-curve 59:36and like we have to think about rapid 59:38gains and capability as a default base 59:41case. 59:42All right. Uh for our last couple 59:44minutes, do you have a question for me 59:48or the audience? Like I know you guys 59:50are always hungry for voice of 59:52customers. There's something that's been 59:53bugging you that you would want to ask. 59:56>> What are what are you? I mean, I'm 59:59always curious like how people use 1:00:00coding agents outside of coding. Like, 1:00:02you know, you mentioned that maybe you 1:00:03use it, you know, you use chatbt to get 1:00:05ready for the interview, but like how do 1:00:07you use it in your in your day-to-day? 1:00:09>> I love that. Um, so I am a relentless 1:00:14omnivore when it comes to AI models and 1:00:18I tend to 1:00:20jump around 1:00:23really quickly, 1:00:25but I have settled task groups that once 1:00:28I decide where I want them to live, I 1:00:30tend to put them there. And so right now 1:00:32I am using chat GPT. 1:00:36Well, it was 5.1, now it's going to be 1:00:375.2 do for uh a lot of the structuring 1:00:43and brainstorming and researching and 1:00:45thinking that I do as I start to think 1:00:47about pieces that I write um and what 1:00:49the stories are kind of and how it kind 1:00:51of comes together. Um I'll use codecs 1:00:56when I want I I call it like hard 1:00:58thinking mode. I think that it's been 1:01:00marketed like I think Chad GBT 5.2 2 Pro 1:01:02or 5.1 Pro, like people talk about it 1:01:04and I've tried it out and I like it, but 1:01:06I think it is sometimes 1:01:10uh overexpressive 1:01:12for what I need. And that's why I 1:01:14mentioned conciseness as a value I 1:01:16appreciate in codeex is that CEX comes 1:01:18back and it doesn't give me, you know, a 1:01:20thousand tokens. It just comes back with 1:01:22a really concise answer. And I I love 1:01:25the legibility of that and I kind of get 1:01:26addicted to it. And so when I need 1:01:28something that is a very clear concise 1:01:30analysis and it can be a financial 1:01:32analysis, it can be a project analysis, 1:01:36it can be an M&A analysis, it can be uh 1:01:39a doc analysis, it can be a response to 1:01:42something that is really complicated 1:01:43that I need to think through and I want 1:01:44to draft it out. Codeex is great for all 1:01:46of that because it just it it boils 1:01:49really cleanly, right? You get exactly 1:01:51what it boils down to and it's really 1:01:52dependable that way. Um, 1:01:55I get uh a lot of mileage right now out 1:01:59of the document tool creation 1:02:04or the tools used to create documents 1:02:06from clot. I know they're the other 1:02:07guys, but they are definitely shipping 1:02:09well there. And just OPUS 4.5 really, 1:02:12really, really does well on that. When I 1:02:14want a PowerPoint, when I want an Excel, 1:02:15it does well. Um, I've been both amazed 1:02:20and also frustrated by uh using Notebook 1:02:24LM and Nano Banana 2 to or really it's 1:02:28is it three or is it no it's two is 1:02:29Gemini 3 anyway using it to ship 1:02:32powerpoints uh because I don't get 1:02:35editability and I hate that but I get 1:02:38all of these lovely graphics from Nano 1:02:39Banana and that's fantastic. Um, and so 1:02:43I think we're in this place where people 1:02:45are um, we're tool omnivores because 1:02:49we're desperate to get the best thing in 1:02:52the moment for the particular task. And 1:02:54we all tend to have like that list of 1:02:56paper cuts. Like I was like, I don't 1:02:58like the PowerPoint piece. I think 1:03:00Claude Opus 4.5 has good tool use, but 1:03:03at the same time, the ability to create 1:03:07decorated PowerPoints is not really 1:03:09there. Um, I think you guys have a ways 1:03:12to go in PowerPoint creation right now. 1:03:15Um, but the completeness that I'm 1:03:17seeing, like I'm doing some early work 1:03:18on shadow GPT 5.2 and and like it spits 1:03:21out a very complete document. Like it is 1:03:23a complete answer that thoroughly 1:03:25answers the question. And so, 1:03:28one of the things I preach a lot is like 1:03:30you must get really fingertippy with 1:03:32your models to actually be able to use 1:03:35them to solve problems in a way that's 1:03:37differentiated from just type something 1:03:40into a chatbot. So, it's a long way of 1:03:43saying I I have about half a dozen 1:03:46models and I use them all every single 1:03:47day, including like a bunch of open 1:03:49eyes. 1:03:50>> Yeah. And then you have this like mental 1:03:53model of like, you know, what is this 1:03:54model good for? What is this model good 1:03:56for? That's right. 1:03:56>> And there isn't yet this like, you know, 1:03:58perfect model that like answers all your 1:04:00needs 1:04:01>> and your needs changing. Yeah. 1:04:03>> Because they keep changing. And that's 1:04:05the thing that you've emphasized that I 1:04:06really strongly agree with is that if I 1:04:08were to try and boil this into a table, 1:04:10it would look incorrect because I have a 1:04:14um an evolving sense map of where these 1:04:17models are good and not good. And it's 1:04:20very it's very fine grain. Like I have 1:04:24learned which models read handwritten 1:04:26tally marks well and which ones don't. 1:04:28Um because 1:04:29>> when you refresh 1:04:31>> when you refresh that thinking like you 1:04:32know that knowledge like when you when 1:04:34you allow yourself to say like hey let 1:04:37me try this other one 1:04:38>> all the time like I and that's I think 1:04:40one of the things that like people lean 1:04:43into with my channel is that I am 1:04:46extremely open to new models and new 1:04:48experiences changing my priors because I 1:04:51think that you have to be to be at all 1:04:54helpful to people in evolving this 1:04:56landscape because you have to assume 1:04:59that any given new model release from 1:05:01any major model maker could upend a key 1:05:03part of your workflow because it's just 1:05:05better and you should not sit there and 1:05:08say well Chad GPT before didn't do this 1:05:10so I'm not going to pay attention to 1:05:11this tool use capability it's like no 1:05:13you you should you should assume that 1:05:15the model is fully capable of surprising 1:05:18you and test it carefully so I think one 1:05:20of the things I've been thinking about 1:05:22is what what we mean when we talk about 1:05:25useful work and what useful work looks 1:05:27like and we've talked a ton about codecs 1:05:29and obviously with codecs useful work is 1:05:32poll reviews uh PR reviews uh it's it's 1:05:35coding it's it's work that you can 1:05:37define in terms of the uh you know bits 1:05:40and bites that the model outputs it gets 1:05:43a little bit more complicated with other 1:05:44knowledge work and so I'd be curious how 1:05:47you guys think about that maybe beyond 1:05:49codecs maybe looking at GPT 5.2 too. 1:05:52>> I would say it's like it's similar 1:05:55question in difficulty like even just 1:05:57for coding. Uh there are certain 1:05:59benchmarks like sweet bench they're like 1:06:01super saturated by now like do they 1:06:03really measure how much use you're 1:06:05getting out of the model like in 1:06:06day-to-day use and also like know we 1:06:08talked about like how we're going beyond 1:06:11just like code generation and then it's 1:06:13like helping you understand things 1:06:14helping you like review like deploy like 1:06:16doing CIS admin tasks and like more and 1:06:18more things. So you know helping you 1:06:19build like design prototypes. So it's 1:06:21all about like this economical value 1:06:23that you're able to create and you know 1:06:26open like really worked hard to like on 1:06:28GDP val like 5.2 to soda on GDP val I 1:06:32think it's like interesting to go and 1:06:34like from really hyper specific 1:06:37saturated evals to a more a better 1:06:40understanding of like how is this 1:06:42impacting the real world you know 1:06:44obviously no eval is perfect but I think 1:06:46whenever a model like it you know puts a 1:06:49new soda on something that measures 1:06:51economical values like it's worth taking 1:06:53a look at it 1:06:54>> yeah it I I appreciate that you call 1:06:56that out because I think it gets at this 1:06:58idea that people get suspicious of 1:07:01benchmarks when they get reported on and 1:07:02you get close to 100%. It's like okay 1:07:04but what's another 2%. Whereas I think 1:07:08there's something around 1:07:10maybe an implicit measure of 1:07:12generalizability that you get when you 1:07:14get to some of these measures of 1:07:16economic impact. So GDP val I think is a 1:07:18good one. Um isn't there one around 1:07:20vending machines? That's another one 1:07:22that's sort of in that vein as well. 1:07:24>> Yeah, it's a fun one. 1:07:25>> Yeah, it's a good one. Yeah. Is it 1:07:26vending bench or vendor? Yeah, 1:07:29but with GTP val it's clearly not 1:07:31saturated yet 1:07:32>> and then that's usually the cycle that 1:07:34you see with eval is you know eval gets 1:07:36published it against traction you know 1:07:38gets saturated like it did measure 1:07:40something useful at some point and then 1:07:42maybe after a couple of months or years 1:07:44like it's not really measuring anything 1:07:45very meaningful anymore because you know 1:07:48every model is performing more or less 1:07:49like the same on it and then you have a 1:07:51new one like GGP is like measuring 1:07:53something more interesting again uh and 1:07:56given it's not Saturday. It's always 1:07:58interesting to pay attention to it. 1:08:00Vending Bench is also a fun one. 1:08:02>> Yeah, it sort of underlines the story 1:08:04that we've been sort of telling through 1:08:06this whole hour together where we've 1:08:08talked about this idea that progress 1:08:10just keeps happening relentlessly with 1:08:12these models and that there isn't a 1:08:14wall. And you know, contrary to popular 1:08:16reports, right, there isn't a wall. 1:08:18We've continue to see progress and it's 1:08:21something that allows us to continue to 1:08:23publish new benchmarks because we keep 1:08:25knocking down the old ones. 1:08:27Yeah, an interesting thing there is sort 1:08:28of like what are the benchmarks of the 1:08:30future? 1:08:31>> Uh, and that would be pointless to have 1:08:33now because every model would just score 1:08:35like zero. 1:08:36>> Uh, like being able to be the CEO of a 1:08:40multi-billion dollar company, for 1:08:42example. Like, you know, that would be a 1:08:43useful benchmark. Like, do we allow 1:08:45models yet to run like multi-billion 1:08:47dollar corporations? Like, you know, not 1:08:49quite. Uh, but I'm pretty sure at some 1:08:51point, you know, we're going to have 1:08:52these kinds of crazy bench. They seem 1:08:54crazy now, but like they're not going to 1:08:56be crazy, you know, like in a couple 1:08:58years. 1:08:59>> That's a really interesting sort of 1:09:01brain teasers. What are what are the 1:09:03evals of 2026 and 2027 that we're going 1:09:05to turn to as measures of value? 1:09:09>> Thanks for having us, Nate. 1:09:10>> Yeah, thank you. That was a good one to 1:09:12end on. Uh this has been lots of fun, 1:09:14guys. 1:09:15>> Thanks so much. Next time.