Learning Library

← Back to Library

Historic $300B Oracle‑OpenAI Cloud Deal

24m • Unknown Channel • ai-ml • news • intermediate • Watch on YouTube ↗

Key Points

Oracle announced a massive $300 billion, five‑year cloud contract with OpenAI starting in 2027, positioning Oracle as a primary multicloud partner alongside Microsoft’s Azure.
The deal fuels the prevailing “picks‑and‑shovels” narrative for AI profits—owning data‑center and GPU infrastructure—while prompting a sharp, though potentially unsustainable, 40% surge in Oracle’s stock.
AI‑focused valuation models (using both Claude and ChatGPT agents) suggest Oracle remains severely overvalued even after accounting for the deal’s net‑present‑value, highlighting a disconnect between market hype and fundamentals.
For OpenAI, the agreement signals a strategic “soft divorce” from Microsoft, giving it leverage and visibility as a market leader while setting the stage for future model generations (e.g., GPT‑7/8) rather than immediate impact on current releases.

Sections

Full Transcript

# Historic $300B Oracle‑OpenAI Cloud Deal **Source:** [https://www.youtube.com/watch?v=_KneeDIbSa0](https://www.youtube.com/watch?v=_KneeDIbSa0) **Duration:** 00:24:48 ## Summary - Oracle announced a massive $300 billion, five‑year cloud contract with OpenAI starting in 2027, positioning Oracle as a primary multicloud partner alongside Microsoft’s Azure. - The deal fuels the prevailing “picks‑and‑shovels” narrative for AI profits—owning data‑center and GPU infrastructure—while prompting a sharp, though potentially unsustainable, 40% surge in Oracle’s stock. - AI‑focused valuation models (using both Claude and ChatGPT agents) suggest Oracle remains severely overvalued even after accounting for the deal’s net‑present‑value, highlighting a disconnect between market hype and fundamentals. - For OpenAI, the agreement signals a strategic “soft divorce” from Microsoft, giving it leverage and visibility as a market leader while setting the stage for future model generations (e.g., GPT‑7/8) rather than immediate impact on current releases. ## Sections - [00:00:00](https://www.youtube.com/watch?v=_KneeDIbSa0&t=0s) **Oracle’s $300B OpenAI Cloud Deal** - The segment highlights Oracle’s announced $300 billion, five‑year multicloud agreement with OpenAI slated for 2027, discussing its strategic shift away from Microsoft‑centric hosting and questioning Oracle’s soaring valuation despite the stock surge. - [00:03:52](https://www.youtube.com/watch?v=_KneeDIbSa0&t=232s) **OpenAI’s Road to 2030 Profitability** - The speaker outlines OpenAI’s projected $90 billion cash burn, its reliance on massive demand to achieve profitability by 2030, and highlights the uncertainty around which unit‑economics model (per model, per data center, or conventional) will ultimately prove viable. - [00:08:35](https://www.youtube.com/watch?v=_KneeDIbSa0&t=515s) **Avoid Competing on Core Primitives** - Builders should focus on specialized tools and orchestration rather than trying to replicate basic work primitives dominated by well‑funded AI platforms. - [00:12:06](https://www.youtube.com/watch?v=_KneeDIbSa0&t=726s) **Claude’s Tool‑Based Workflow Efficiency** - The speaker explains how teams are leveraging Claude’s off‑hour reliability and on‑demand inference model to handle work tasks via tool calls—producing high‑quality documents efficiently, though not flawlessly, and emphasizing practical utility over binary judgments. - [00:15:51](https://www.youtube.com/watch?v=_KneeDIbSa0&t=951s) **AI Commerce & Self‑Regulation Outlook** - The speaker expects industry self‑regulation for minor protection while noting Google’s multilingual AI search expansion with built‑in shopping tools, signaling a move toward chat‑driven commerce and checkout integration. - [00:19:41](https://www.youtube.com/watch?v=_KneeDIbSa0&t=1181s) **Future Agent Collaboration & Hallucination Research** - The speaker highlights emerging AI trends like Genesis’s autonomous agent‑to‑agent workflow composition slated for 2026, alongside OpenAI’s new study tying hallucinations to word‑prediction‑focused pre‑training. - [00:22:47](https://www.youtube.com/watch?v=_KneeDIbSa0&t=1367s) **Organizational Approach to LLM Hallucinations** - The speaker argues that hallucinations must be treated as an organizational problem, cautions against simplistic fixes, and stresses that the blunt reward signals in LLM training fundamentally limit nuanced, accurate responses. ## Full Transcript

0:00what was most important that happened in 0:02AI this week. I want to go through, 0:03we're going to keep it pretty casual. 0:05We're going to go through the news 0:06stories I think mattered the most. I'm 0:08going to give you some commentary on why 0:10I think they mattered and where they're 0:11going strategically. We're going to do 0:13one news story at a time. Number one, 0:15Oracle and OpenAI's historic $300 0:18billion cloud deal. Fundamentally, what 0:20happened is that Oracle announced during 0:22their double earnings mess that they had 0:24signed a $300 billion five-year cloud 0:27computing deal with OpenAI that will 0:29begin begin not this year, not next 0:33year, but in 2027, 0:35marking one of the largest contracts in 0:37tech history. This would position Oracle 0:39as OpenAI's primary cloud provider 0:42alongside Azure, which further shifts 0:45OpenAI's partnership dynamic away from a 0:50Microsoft first stance into a multicloud 0:52stance. Oracle has got to be happy about 0:54this. Larry was whistling all the way to 0:56the bank because I think Oracle stop 0:59stock popped 40% at one point. So, he's 1:02the richest man in the world for what 1:03it's worth. But if you actually look at 1:05the unit economics of the Oracle 1:07business, the valuation after the pop is 1:10tough to sustain. I used this as a 1:12chance to check out uh a story we'll get 1:14to later with Claude sort of making uh 1:16models and writing it to Excel. And I 1:18tested that with agent and operator as 1:20well. Both models, both Chad GPT's agent 1:22model and Claude's model were 1:24unanimously concluding that Oracle is 1:27severely overvalued even given the $300 1:30billion cloud deal on a uh net present 1:32value terms. Now, we all know that one, 1:34this is not investment advice, and two, 1:36the market doesn't react rationally to 1:38things. So, I don't know where the 1:40market is going with this. The takeaways 1:42that I have from the deal are one, the 1:44market is so hungry for a continuation 1:47of the picks and shovels line. Uh, this 1:50goes back to the Ashen Brener memo in 1:522024. The idea that the way to make 1:55money in AI is to have data center 1:57stakes, GPU stakes, the picks and 2:00shovels of the new gold rush. That's the 2:02play Oracle is making to the market. 2:03That's the narrative the market has 2:05bought. Mary Mer, I did a big uh video 2:07summary on Mary Mer a couple months ago 2:09and her deck was heavy on picks and 2:11shovels. This is Wall Street's narrative 2:14for how to make money on AI. Larry is 2:16smart enough to know it and Larry is 2:18playing into that. Meanwhile, for 2:19OpenAI, like I was saying, they're in a 2:22soft divorce with Microsoft and for them 2:25having a multicloud option is really 2:26helpful. Being able to announce a big 2:29deal like this helps them move the ball 2:30forward with the narrative for this 2:32month this week. Sam loves to be in the 2:35news. He loves to have Open AI 2:37positioned as a market leader. Certainly 2:40inking the largest cloud deal in history 2:43counts as being a market leader. All of 2:45this is actually going to play out in 2:46reality by the time we get to chat GPT7 2:49or 8. It is not something that we are 2:52going to feel with any of the current 2:53generation of models because the 2:55beginning of the compute deal isn't for 2:57another year and a half. And so I think 3:00the thing I want to caution is that when 3:04you see these deals, look enough at the 3:06timelines to understand what matters and 3:09why. The last takeaway I have before we 3:11go to the next story is that the fact 3:13that both sides felt good inking a deal 3:16this big for a start date this far out 3:20argues to me that the profits of doom 3:23who claim that we are at the peak of the 3:25AI hype cycle are probably wrong. If 3:28you're willing to ink deals that far 3:30out, you are committed to a compute 3:33budget that requires you to be prepared 3:36for that. Because part of the reason 3:37they have to start the date out, it's 3:39not because they wanted to. It's because 3:41you have to get everything ready so you 3:42can actually operate the compute at that 3:44scale. Like this is a big contract. It 3:46ties into the Stargate plans which 3:48Oracle is also involved in with OpenAI. 3:50And so OpenAI is planning on massive 3:52demand. And this actually comes back to 3:54the cash flow burn rates that they 3:56updated this week as well, which is sort 3:58of a part B of this story. I think they 4:00updated and they have close to $90 4:02billion in new burn rates that they're 4:04expecting. Interestingly enough, they 4:06are projecting or at least on paper, at 4:08least for some investors, a path to 4:11profitability in 2030. And so, at least 4:14on paper, the the idea that OpenAI is 4:16selling here is that they see massive 4:18demand. They see that demand massively 4:21scaling for the next 5 years. And their 4:23expectation is that they will hit 4:25profitability off of the unit economics 4:27associated with that scale. We shall 4:29see. This is one that we will probably 4:31come back to in future weeks, future 4:33news stories. My my my suspicion is my 4:37concern is that the unit economics of AI 4:40have to be worked out and there's two or 4:42three permutations and it's not clear 4:45which one works and I'll leave that as a 4:46question but for example it's not clear 4:48if it's correct to measure profitability 4:50on a per model basis and so a subsequent 4:53model would have a different um a 4:55different profitability number. It's not 4:57clear if it's actually most accurate to 4:58measured on a per data center basis. So 5:00you look at the data center you in 5:02economics but that's not that confusion 5:04is not and and maybe it's gap right like 5:07that's the third one where it's just 5:08like conventional like you get the 5:09revenue you get the cost and you look at 5:10what you bring in and you look at what 5:12it costs you look at your revenue per 5:14customer and the burn per customer and 5:15you see but regardless of what it is 5:17it's not stopping people investing in it 5:19even though it is burning up investor 5:21money at this point like when you update 5:23your burn rate and you say oh by the way 5:24we're going to add $90 billion in burn 5:26that's a pretty significant update on 5:28your burn rate right like It's 5:30non-trivial. So, that's where we're at. 5:33Demand is spiking. Biggest cloud deal in 5:35history. Unit economics still uncertain. 5:37Story number two. This one did not get 5:39reported on as much as I think it should 5:41have. Claude's enterprise memory 5:45revolution. So, Anthropic launched uh 5:47team memory for Claude September 9 to 11 5:51roughly. And this is for enterprises for 5:53teams accounts. It's not just like don't 5:56think of it as chat GPT's memory for 5:58enterprises. It's actually a different 6:00philosophy around AI collaboration and I 6:02want to kind of lay it out for you. So 6:04what makes Claude's approach unique is 6:06that Claude has project isolated memory. 6:09So every claude project on an enterprise 6:11account would have separate memory 6:14contexts and context windows which would 6:16enable you to have confidential client 6:18work and not mix it with general ops 6:20work or with the work of other clients. 6:21It also has much more transparent tool 6:23calling which you and I have probably 6:25already seen if you've worked with 6:26Claude. It's very open about what it 6:28calls and so Cloud's memory works 6:29through very visible function calls like 6:31conversation search or recent chats. Um, 6:34so you can see and understand what's 6:35going on which improves the auditability 6:37and transparency for the enterprise. 6:39Finally, there's something called work 6:41focused context here that I want to talk 6:43about that's really interesting. It 6:45automatically builds persistent profiles 6:47of team workflows, client requirements, 6:50and project specs. And that means that 6:53it is going to start to get to know your 6:55work better over time. So the practical 6:58implications for builders are one if you 7:01were building a claude wrapper or any 7:03kind of AI wrapper for the enterprise 7:05and your breakthrough was uh easy memory 7:07I would be sweating tonight. Uh it is 7:09reminding me again that one of the 7:12things that we see coming through in the 7:17overall pace of AI builds is this focus 7:21on primitives for work. And what I mean 7:24by that is that if you look at the pace 7:26and trend of recent AI adoptions, what 7:29you see is that we are leaning in on 7:34anything that counts as time in the 7:37stack for the workday. So you see this 7:39with the cloud projects and the sort of 7:41memory for cloud that's keeping you in 7:42the cloud ecosystem as you work as a 7:44team. You see it for Excel, you see it 7:48for Word, you see it for PDFs, you see 7:51it for PowerPoint. These are all 7:52connectors that Claude added. You see it 7:54for Claude becoming a personal assistant 7:56on mobile this week where you can 7:58actually connect calendar and Gmail on 8:00the mobile app for Claude and Claude 8:02will effectively act like a personal 8:04assistant. If you're in claude, it can 8:06search your calendar, come back with 8:07recommendations for times like it for 8:09fairly sophisticated things you would 8:11previously use a human for. And so 8:12they're trying to keep you in the work 8:14stack by building primitives. And Chad 8:17GPT is doing the same thing. That's why 8:19they're leaning heavily on codeex now as 8:21a competitor to Claude Code, which by 8:24the way guys, the motion that Claude is 8:26doing here to go beyond Claude code 8:28tells me they are trying to diversify as 8:31codec starts to eat Claude market share. 8:33That's my guess. But anyway, leaning in 8:35on primitives for code, cloud code, 8:37codecs, that's all part of the same 8:39motion. And frankly, I may not be happy 8:41with the quality of implementation, but 8:43chat GPT has been leaning in on the 8:45connectors as well, right? Leaning in on 8:47Excel, leaning in on PowerPoint with 8:48agent mode, etc. Everyone wants you in 8:51the work stack. And so, if you are a 8:53builder, what this means is that you 8:55should not be trying to compete on 8:57primitives. You should be trying to 8:58compete on tools that are more 9:00specialized. Don't try and build Excel 9:03for the office. Bet on somebody grabbing 9:06that primitive unless you are very very 9:09well funded. You've crossed your series 9:10B and you have traction. Well, then it's 9:12a different story, right? It's just it's 9:14not that there are any impossible bets 9:15in business. It's that there are bets 9:17that are harder. And right now competing 9:19for work primitives is competing with 9:21some very very wellunded model makers. 9:23Practical implications that go beyond 9:24sort of where you position. It is going 9:26to be easier to build agent 9:28orchestration workflows in the 9:30enterprise because of features like 9:31this. It is going to be easier to 9:33maintain context across coding session 9:35because of features like this. It is 9:37going to be easier for sales teams to 9:39maintain context across deals, product 9:41teams to maintain specs. This is 9:43something where linear for example is 9:45going to feel a little bit of heat. Not 9:48because and Jira too, right? Because 9:50they are used to being a place where you 9:52record work being done. We're not at the 9:54point yet where any model maker has 9:56rolled out a ticketing system. But we're 9:58also at a point where I wouldn't be too 9:59surprised if they got close to that 10:02because it's such a primitive for 10:03engineering work and because the things 10:06that make ticketing system works are 10:08also things that these model makers are 10:09going after like context like being able 10:12to formulate text and break it out 10:14across specs like being able to handle 10:15technical requirement development etc. 10:18The last thing I want to call out is 10:19that Enthropic is maintaining a 10:21consistent perspective on transparency 10:24and privacy that is going to serve them 10:25well with the enterprise. They've been 10:28really insistent on that from day one. 10:30It is a brand. I'm not even talking 10:32about terms of service. It is a brand 10:34they are maintaining in the marketplace 10:36and the way they chose to roll this out 10:38reinforces that brand. So I am curious 10:41to see how this plays out. There seem to 10:43be competing AI visions here. Chat GPT 10:46seems to be leaning heavily into the 10:48current user base with consumer. They're 10:50also leaning on the code side. They're 10:52also leaning on enterprise deals with 10:54their brand as like the big heavyweight 10:55in the room. Claude is pushing tool 10:58calling really hard and talking a lot 11:00about being a collaborative colleague. 11:02And that tool calling line makes sense. 11:04By the way, there's a GPU implication 11:06here that no one is talking about. So 11:09people don't know this, but technically 11:11speaking, Opus and Sonic, Claude's 11:13models, they are not are not inference 11:16models. And part of why I think is that 11:20Claude doesn't have the GPUs to serve 11:23heavy inference models right now. 11:24They've been more GPU constrained than 11:26OpenAI over the course of their history. 11:29Fine, they're using a large model 11:31instead. If you look at the 11:32parameterization of Opus, it is a big 11:34big big model. And what they're focusing 11:36on is intelligence driven by a big model 11:39for rational tool calling. And it turns 11:41out that is a relatively good bet. Like 11:44they're making the tool calling 11:45transparent. They're letting Opus be the 11:47planner and they're just driving the 11:50ability to solve hard problems through 11:52tools versus through inference, which is 11:55a somewhat more GPU efficient way to do 11:57it if you're constrained. And we all 11:58know like Claude even so still suffers 12:02from GPU brownouts, GPU constraints. 12:04people like that work in Tokyo and 12:06Stockholm say that claude works better 12:08on the off-American hours etc. They have 12:11had troubles with that and so I think 12:12that that's part of why they're leaning 12:14into tools and I think that we will have 12:17to see when they feel comfortable enough 12:19with the compute budget to roll out an 12:22inference model. But what's interesting 12:23is if they do that they may choose to 12:26make the inference model a ondemand 12:29claude opus calls the inference model 12:30when needed almost like a tool kind of 12:32approach because most of what they're 12:34doing here it's a lot of the workday 12:36that they're picking up and handling 12:38through tool calls rather than through 12:39inference and that's pretty efficient 12:42right that makes a lot of sense we've 12:43already talked about in this claw story 12:45the file creation capabilities I did a 12:47whole post on that this week it is a big 12:50big deal getting quality Excel quality 12:52PowerPoint, quality PDF, quality Word 12:55docs, non-trivial pieces of the workday, 12:57hand it over. I don't want you to take 12:59this and think this means that it does 13:02it perfectly. So often when I have these 13:04AI conversations, I feel like we get 13:06trapped in binaries. It's like it's off 13:08or it's on. It was terrible and now it's 13:09great. The right question is, is the 13:13work that Claude is doing useful enough 13:16that I can move much faster as a result? 13:19And Claude is the first model that has 13:21produced work artifacts that meet that 13:23bar and easily meet that bar. I would 13:26not say it is perfect. And I'm not going 13:28to pretend to say it's perfect. And 13:30what's interesting is people say, "Oh, 13:31so that means there's hallucinations." I 13:33actually haven't found that to be the 13:35issue. The issue was more the fit and 13:38the finish and the polish that typically 13:40come with extremely highle Fortune 100 13:42presentations. Claude's not quite there 13:44on the polish and design sort of side of 13:46things. I've actually wondered why Figma 13:48hasn't leaned in on like an AI powered 13:50design thing because I feel like it 13:52would be really easy for Figma to say, 13:54"Do you want design chops? Here's an MCP 13:56server. We're going to bill you." Or 13:57whatever it is. And like you can get 13:59Figma with MCPS for a certain amount of 14:01month and just call design polish into 14:03your your stuff. But that's not the 14:05world we live in, right? Like that's 14:06that's a different world. And Figma 14:08hasn't moved in that direction. And in 14:09the meantime, we do have a real design 14:11gap with AI. The other thing I will call 14:14out is that we don't really know how to 14:17make this transition at work. And that's 14:19very much TBD. You can create documents 14:21really easily, but for teams tomorrow, 14:23for Teams Monday, they have to figure 14:25out and triage if they want to adopt 14:27this. What docs do I put in and edit? 14:30What Excelss do I put in and edit and 14:32start to move through Claude versus what 14:33do I build new in Claude and why? Um, 14:35and I've already had those kinds of 14:37conversations with teams. It's it's 14:39happening already. All right, let's get 14:42to the next story. We we Let's move past 14:43Quad. The FTC launched an AI safety 14:47crackdown. So, the Federal Trade 14:49Commission is launching an AI chatbot 14:51inquiry targeting seven major AI 14:53companies, all the big names, and 14:55they're trying to figure out how to 14:58regulate the industry, particularly 15:00around safety. And so, just to like dig 15:03into that, the seven companies, OpenAI, 15:06Meta, Google, Snap, which is 15:07interesting, Character.ai, and XAI. And 15:10so the companies are going to be 15:12required to provide detailed safety 15:14metrics and monitoring protocols. They 15:16want to focus on protecting children 15:17from potentially harmful AI 15:19interactions. And there could be a new 15:21FTC roll out of compliance requirements 15:23and safety standards across the 15:25industry. This follows on from recent 15:27lawsuits involving chat bots, teen 15:29mental health issues. And basically the 15:30FTC is is signaling like the red line is 15:33making sure that kids are safe and they 15:35will go after companies that they 15:37perceive as potentially not doing enough 15:40in that area or at least that they want 15:42to regulate in that area or that have an 15:44exposure to that kind of experience for 15:47children. We will see where this goes. 15:49For now, my expectation, my basease 15:51expectation is the industry is going to 15:54want to cooperate. the industry is going 15:56to want to self-regulate and there will 15:58probably be some sort of self-regulatory 16:00FTC oversaw regime of some sort that 16:04says these are the standards we have for 16:05protecting minors etc which I think 16:07would be good like it's a step forward 16:09toward actually normalizing this as a 16:12real business a real vertical a real 16:14industry that needs to have proper 16:16safety procedures that everybody agrees 16:18on and there's no real ground rules that 16:20everybody agrees on right now next story 16:22Google AI mode uh so Google expanded AI 16:25mode uh which is its search sort of AI 16:27fancy search beyond English. It now 16:29supports uh other major markets for 16:31Google including Hindi, Indonesian, 16:33Japanese, Korean and Portuguese. And so 16:35this is a Chad GPT like search 16:37experience. It has enhanced shopping 16:39capabilities. You'd better believe 16:40they're looking at Q4 this year for 16:42that. It has in chat checkout. It has 16:44visual tryon features probably powered 16:47by uh the new Nano Banana. And so I look 16:50at this as a real step in the direction 16:53of chat powered commerce. With Fidget 16:56coming on at OpenAI, I have been really 16:58eyeing the idea that like we're going to 17:00have more work done by chat GPT for Q4 17:03this year for ad powered uh checkout 17:06experiences or checkout experiences that 17:08are more ubiquitous in chat GPT. Right 17:11now you can browse products but you 17:13don't sort of complete the checkout but 17:14there's some signals in the code. 17:15They're thinking about that. we are 17:17going to start to see commerce move off 17:18of platforms like Amazon into the chat 17:21experience and my base case expectation 17:24is that the first big season when that's 17:26going to be tried out is Q4 of this year 17:28and so I would be a little bit surprised 17:30if we didn't see multiple major model 17:32makers going for that. I'm guessing 17:35given their branding that Enthropic is 17:37not going to do it for now. Uh but 17:38Google and Chad GPTI would both sort of 17:40expect them to do that. We will see. 17:42Time will tell. The good news is it's 17:44already September, so we're going to 17:45find out in the next month or two how 17:46that goes. So the AI agent market is the 17:49next one. And really like part of the 17:51story here is that people are realizing 17:53how big this market is. So the AI agent 17:56market is now projected to surge roughly 17:5910x in 4 and 1/2 years. So let's call it 18:035ish billion this year. It's expected to 18:06get who knows at this growth rate, but 18:08between 40 and 50 billion by 2030. um 18:11they'll probably revise it again in the 18:12next few months. Um and what's notable 18:14to me is that uh success rates for AI 18:18agent deployments are going up in 2025 18:20relative to 2 years ago, which if you 18:23work as a builder should not surprise 18:24you. I see many more successful AI agent 18:27projects now than I did last year or the 18:29year before. But if you just read the 18:31headlines, if you read the MIT 95% AI 18:34fail study, you'd think, oh no, I mean 18:36that's useless. Like that's terrible. 18:38I'm sure they all fail. It's not true. 18:41It's not how builders actually see it. I 18:42did a whole piece on this uh on Friday 18:45talking about this idea that the 18:47builders know what's really going on in 18:49AI. And part of how we see that play out 18:52is is this reality on the ground where 18:54AI agent deployments are actually 18:55working better than they were. Along 18:57with that, you have a whole host of new 18:59agent launches. One of the ones that's 19:01interesting is we never talk about 19:02Amazon in the AI space, but they just 19:04keep chipping away. Um and they have 19:06deep pockets and we'll see where they 19:07end up. Amazon introduced quick suite 19:09this week. It merges AWS products with 19:11pre-built workflows for natural language 19:13automation. Basically trying to tack 19:15some agent mode onto existing AWS 19:18product products and and we'll just sort 19:20of see how that goes. Another one that's 19:22interesting, there's a few of these. I 19:24never get them all. They're the new sort 19:25of announcements. Um launched Deepell 19:28Agent. It's an autonomous AI system for 19:31knowledge worker tasks across finance, 19:33sales, and marketing. I always take 19:35these with a big block of salt till I 19:36can start to see them in reality. We 19:38shall see. But they announced it, right? 19:40And you should expect to see more of 19:41these aggressive announcements as we go 19:43forward. Uh in the same vein, a company 19:45called Genesis announced uh A2A agent to 19:48agent collaboration, which is a sort of 19:50a system for enabling agents to work 19:52together without human intervention. 19:53This is going to be one of the really 19:55hot areas in 2026 where we're going to 19:57start to see people say, I have agents 20:00and I want them to self-compose 20:02workflows. I don't want to have to 20:04script the workflows for them. For now, 20:06that remains something that's very 20:08cutting edge and I think that we'll 20:09start to see that move in in the new 20:11year. What else happened? So, there was 20:13a big set of headlines. Again, OpenAI 20:16loves headlines. OpenAI publishes 20:18research identifying the core causes of 20:20AI hallucinations. OpenAI attributes 20:23them to pre-training processes that 20:24prioritize word prediction over 20:26truthfulness. And this was presented in 20:28the headlines as novel and it was 20:30presented as groundbreaking and OpenAI 20:33thought leadership with OpenAI saying 20:35they could see a path to closing out 20:37hallucinations. I I guess I'm really 20:39aging greatly. One of the things that 20:41I've been writing about and I don't want 20:42to pretend I'm the only one. Lots of 20:44people have been talking about for a 20:45long time has been that when you 20:48prioritize in training a single turn 20:51response where the model must generate 20:53text, must be shown as helpful, must 20:56generate detailed information and must 20:58be proactive. You get exactly what you 21:00see today. You see models that are 21:02optimized for a single turn. You see 21:03models that are optimized to give you a 21:05response whether they know the answer or 21:07not. And this leads to hallucinations. 21:09Big surprise. I I don't know why this 21:11was considered novel. Like if if a model 21:14has the choice between telling you the 21:16truth, which is I don't know, and 21:18telling you a really nicely crafted, 21:21good PR value, professional sounding 21:24email with lots of numbers and details. 21:26And the model is rewarded in training 21:28for the latter, not the former. Are you 21:31really shocked that it likes to 21:32hallucinate numbers? That's exactly 21:34what's going on. And OpenAI is 21:36presenting this as if it's like news. 21:39It's not news, guys. is like this is how 21:42we've trained models for a long time. 21:44And part of why is that if you're 21:48building a model for a billion people, 21:50you have to think real seriously about 21:52your engagement rates. If the model 21:54starts saying I don't know or no or this 21:57isn't correct like like models like that 21:59that don't keep you chatting like it 22:00becomes material for OpenAI's business 22:03at a certain point. And I don't want to 22:04say that OpenAI is unwilling to fix a 22:07hallucination problem because of the 22:10engagement rates on the business that 22:12they're doing. I have no evidence to say 22:13that that's ex what is happening. But I 22:16do want to say that the effect is real. 22:19And I want to say that hallucination 22:21root causes are not all that mysterious 22:23the way it's been portrayed. and that it 22:25is actually more useful to think about 22:27hallucinations as a series of different 22:30classes of unwanted behaviors that can 22:33be addressed at both a technical tool 22:36level and also at a system level. But 22:38we're not really talking about it like 22:40that. And ironically, that is probably 22:43what we should be talking about it. 22:45That's how we should be talking about it 22:47if we are trying to address it at the 22:50rollout level for organizations and 22:52leadership. I'm going to be writing more 22:53on that topic I think later this 22:54weekend. There's there's something 22:56around addressing hallucinations as an 22:59organizational problem that isn't being 23:01said and thought about enough. For now, 23:03don't believe everything you hear when 23:05you see a headline like that. The 23:06hallucination cause is well known and I 23:09would be somewhat surprised if I saw a 23:12substantial change in training regimes 23:15because of the benefits of the current 23:17training regimes. The benefits and 23:19engagements the benefits frankly and 23:21some of the things we do want. We want 23:22models that will give us a full detailed 23:25response. Like what if what if 23:28curtailing hallucinations comes at the 23:30pace or the or comes at the cost of 23:34giving you full and detailed answers 23:35when you give the model the information 23:37it needs. Would we take that trade-off? 23:39This gets at what Andre Carpathy has 23:41called out as a fundamental weakness in 23:42LLM training. And maybe we'll end on the 23:44philosophical note here, but one of the 23:46things Andre has called out that I think 23:47is correct is that training is a blunt 23:50reward signal. So if you say yes or no, 23:52the only thing you can do is reward one 23:54of those responses. It's a really blunt 23:56reward signal. So if the model comes 23:57back and says I don't know, you have to 23:59either say that is a good answer or a 24:00bad answer. There's no in between. 24:01There's no nuance. You can't say why. 24:03Similarly, if it comes back with a fully 24:05hallucinated answer with lots of details 24:06that's formatted well that looks 24:08perfect. You can either say it's good or 24:09bad. You can't give it any any nuance 24:11there. And that is part of why I say 24:13eliminating hallucinations may have 24:15negative downstream consequences. Not 24:17just for the engagement case, but also 24:19for situations where you want the model 24:21to be proactive, detailed, fill out full 24:24pieces of information. And because we're 24:26working with effectively blunt 24:27instruments for training, Andre's 24:29pointed out, we have limited flexibility 24:30to help models learn. And models 24:32learning is actually one of the big 24:34unanswered questions in AI. How do we 24:36help models learn? How do they learn 24:38after their release? But also, how do 24:39they learn with much more nuance in 24:41training? I'll leave you with that 24:43question. I hope you've enjoyed the uh 24:45breakdown of news of the