Learning Library

← Back to Library

Evaluating OpenAI’s New O3 and O4 Models

Key Points

  • The panelists—Chris Hay, Vyoma Gajjar, and John Willis—each shared their “preferred model,” ranging from GPT‑4.1 and the classic o4 to Gemini 2.5 and the newer o3/o4‑mini.
  • OpenAI’s recent launch of o3 and o4‑mini sparked enthusiastic reactions: Chris praised o3 for its richer personality and strong code‑refactoring suggestions, while noting o4‑mini’s speed for quick tasks like unit‑test generation.
  • The episode also previewed upcoming topics, including Gemini being deployed on‑premises, John’s recent blog posts on AI evaluation tools, and NVIDIA’s announcement of new chip‑fabrication factories in the United States.
  • Some Twitter commentators dismissed the o3/o4‑mini releases as merely incremental improvements, highlighting the ongoing debate over how “ground‑breaking” each new model generation truly is.

Sections

Full Transcript

# Evaluating OpenAI’s New O3 and O4 Models **Source:** [https://www.youtube.com/watch?v=8e6StFBP0VM](https://www.youtube.com/watch?v=8e6StFBP0VM) **Duration:** 00:41:47 ## Summary - The panelists—Chris Hay, Vyoma Gajjar, and John Willis—each shared their “preferred model,” ranging from GPT‑4.1 and the classic o4 to Gemini 2.5 and the newer o3/o4‑mini. - OpenAI’s recent launch of o3 and o4‑mini sparked enthusiastic reactions: Chris praised o3 for its richer personality and strong code‑refactoring suggestions, while noting o4‑mini’s speed for quick tasks like unit‑test generation. - The episode also previewed upcoming topics, including Gemini being deployed on‑premises, John’s recent blog posts on AI evaluation tools, and NVIDIA’s announcement of new chip‑fabrication factories in the United States. - Some Twitter commentators dismissed the o3/o4‑mini releases as merely incremental improvements, highlighting the ongoing debate over how “ground‑breaking” each new model generation truly is. ## Sections - [00:00:00](https://www.youtube.com/watch?v=8e6StFBP0VM&t=0s) **Choosing Favorite LLM Models** - In the opening of the Mixture of Experts podcast, host Tim Hwang asks guests Chris Hay, Vyoma Gajjar, and John Willis to name their preferred AI models, sparking a brief round‑robin before previewing upcoming discussions on Gemini on‑prem, AI evaluation tools, Nvidia chip factories, and OpenAI announcements. - [00:03:11](https://www.youtube.com/watch?v=8e6StFBP0VM&t=191s) **DevOps Founder Evaluates AI Model Benchmarks** - A DevOps pioneer compares O3‑mini and Gemini models using SW Bench and Polyglot benchmarks, emphasizing the need to stay current on AI tools to address client challenges. - [00:06:14](https://www.youtube.com/watch?v=8e6StFBP0VM&t=374s) **Clarifying “Thinking in Images”** - A participant questions and unpacks the claim that AI models “think in images,” discussing how visual reasoning might work using examples like pivot tables and screenshots. - [00:09:22](https://www.youtube.com/watch?v=8e6StFBP0VM&t=562s) **Open‑Source AI Lag and Catch‑Up** - The speaker predicts that closed‑source breakthroughs (e.g., GPT‑4.1) will be quickly matched by open‑weight models like DeepSeek within weeks, noting a recurring lag but hoping open models will eventually overtake proprietary ones. - [00:12:28](https://www.youtube.com/watch?v=8e6StFBP0VM&t=748s) **Beyond Models: Ecosystem and Tooling** - The speaker argues that the future of AI solutions depends more on integrated toolchains, orchestration, and domain knowledge than on any single model. - [00:15:34](https://www.youtube.com/watch?v=8e6StFBP0VM&t=934s) **Google Opens Gemini On‑Premises Deployment** - The speaker views Google's decision to let enterprises run Gemini models on‑prem via Vertex AI as a major shift that eases security concerns, counters cloud‑only rivals, and positions Google as a strong enterprise AI option. - [00:18:41](https://www.youtube.com/watch?v=8e6StFBP0VM&t=1121s) **Security, Scale, and Market for Gemini Models** - The speaker debates the security of model weights on specialized chips, questions whether non‑mega firms can afford GPU resources for large Gemini models versus smaller variants, and muses on the broader market implications. - [00:21:46](https://www.youtube.com/watch?v=8e6StFBP0VM&t=1306s) **AI Adoption Hurdles in Manufacturing** - The speaker highlights client concerns over data sovereignty, governance, latency, and IP protection when adopting scalable AI models, noting that open‑source smaller models aren't sufficient for large‑scale industrial needs. - [00:24:57](https://www.youtube.com/watch?v=8e6StFBP0VM&t=1497s) **Auditable Evaluation of LLM Performance** - The speaker outlines how probabilistic metrics such as correctness and hallucination rates are measured and audited, emphasizing the shift toward specialized language models designed specifically to serve as reliable evaluators rather than inference engines. - [00:27:59](https://www.youtube.com/watch?v=8e6StFBP0VM&t=1679s) **AI Governance Challenges in Enterprise** - The speaker warns that moving to LLM-driven solutions eliminates traditional rule‑based oversight, creating opacity and high‑risk manipulation, which necessitates robust guardrails and will likely trigger stricter government regulation in critical industries. - [00:31:03](https://www.youtube.com/watch?v=8e6StFBP0VM&t=1863s) **Governance, Audits, and Brand Risk** - The speaker critiques the messy state of AI audits, advocates for structured automated governance—especially in banking’s three‑line‑defense model—and stresses that protecting brand reputation will drive faster, more effective evaluation practices. - [00:34:09](https://www.youtube.com/watch?v=8e6StFBP0VM&t=2049s) **AI Audit Vulnerabilities and NVIDIA Chip Funding** - The speaker forecasts that AI evaluations and policies will become top‑level, probabilistic functions susceptible to prompt‑engineering attacks that could falsify compliance audits, and then highlights NVIDIA’s announcement of a $500 billion investment in U.S. Blackwell chip manufacturing. - [00:37:14](https://www.youtube.com/watch?v=8e6StFBP0VM&t=2234s) **Emerging AI Jobs & Nvidia's Blackwell** - A speaker reflects on the steep learning curve, open‑source momentum, and partnership potential driving new data‑center and AI roles, while questioning how quickly Nvidia’s coveted Blackwell chip can be rolled out. - [00:40:15](https://www.youtube.com/watch?v=8e6StFBP0VM&t=2415s) **Reshoring Manufacturing: Labor & Upskilling** - Panelists discuss the difficulties of moving high‑tech manufacturing to the United States, citing union resistance, cultural differences, the need for extensive employee upskilling, and reliance on government support. ## Full Transcript
0:00o3, o4, o4-mini, o4-mini high, GPT-4o, GPT-4.5. 0:05What model are you using? 0:07Chris Hay is a Distinguished Engineer and CTO of Customer Transformation. 0:10Uh, Chris, welcome back to the show. 0:12And, uh, what's your preferred model? 0:13Oh, you missed 4.1, Tim, so that's gonna be my model. 0:17I'm picking 4.1, the one Tim didn't pick. 0:20Very nice. 0:21Thank you Chris. 0:21Vyoma Gajjar is an AI Technical Solutions Architect. 0:24Vyoma welcome back to the show. 0:26Uh, your preferred model, please. 0:28Thank you. 0:28And I think it's OG o4. 0:30Nice. 0:31The classics. 0:32And joining us for the very first time is John Willis, who's an Author and 0:35Owner of Botchagalupe Technologies. 0:37Uh, John, great to have you on the show. 0:38What is your preferred model? 0:40Hey, Tim Gemini 2.5, oops, sorry. 0:42Uh, no, I actually, I. I, I think I was o3 but I think 4.1 is actually for 0:47coding is kind of my favorite right now. 0:49Nice. That's awesome. 0:50Well, all that and more on today's mixture of experts. 0:58I'm Tim Hwang and welcome to Mixture of Experts. 1:00Each week, MoE brings together a world-class crew of technical 1:03experts, wise crackers. 1:05I'm talking about you Chris, and industry veterans to discuss and debate the 1:08biggest news in artificial intelligence. 1:10As always, there is a lot to cover. 1:12We're gonna talk about Gemini being on premises. 1:14We're gonna talk about John's great blog posts on AI evaluation tools, and we're 1:18gonna talk about in NVIDIA opening up factories for chips in the U.S. But first 1:22I want to start with OpenAI announcing. 1:26So just this week they announced o3 and o4-mini , um, their kind 1:29of latest generation of their ongoing kind of class of models. 1:34And I guess maybe Chris, I'll throw it to you first. 1:36On a vibe check. 1:37These seem really good, like o3 seems amazing. 1:41Um, I don't know if you agree with that or how you've kind of felt about it on 1:43a kind of initial pass about the models. 1:45Yeah, no, I've been having a lot of fun with those models last night. 1:49So o3 is really good. 1:52And one of the things I really appreciate it about as well, it is actually. 1:56Improve the personality is just a lot more on it. 1:59So things like being able to kind of make really good refactoring suggestions 2:04and how to improve the architecture of your code is actually coming 2:07back with some really good stuff, I have to say o4-mini at the moment, 2:11just for getting stuff done quickly. 2:13You know, I want to create some unit tests or I just wanna refactor some code. 2:16Then o4-mini is just doing great and it is super, super fast. 2:21So I'm impressed with the models. 2:23I'm loving it. 2:24And again, as I said at the beginning. 2:254.1 sitting in the kind of code XCLI loving that as well. 2:30So, uh, this is, this is a great week for models. 2:33John, it'd be good to bring you in. 2:34I mean, I think, you know, there's some grumpy people on Twitter. 2:37There always are grumpy people on Twitter from the peanut gallery, who 2:40for these announcements were like. 2:42This is just incremental. 2:43This is not like a big deal. 2:44There's no big new features. 2:46They're announcing this is just like a slight improvement and 2:49like what, where, you know, the, the argument was kind of OpenAI 2:51Is like asleep at the wheel. 2:53Just 'cause like, they're not really making the groundbreaking 2:55advancements that we were expecting. 2:57I do you buy that at all as a way of kind of thinking about this new announcement? 2:59No, I think they're, they're constantly advancing. 3:02I mean, you know, like I I said earlier, sort of half joke, not really joke, not 3:06joking at all about the, the, the 2.5 on Gemini, how powerful that is, and 3:10then we'll get to a Google section. 3:11But, but, um, but I, you know, I went all in on, you know, o3-mini 3:14with deep research and that was, okay, this is changing my life. 3:17And then like literally a month later I'm finding that, you know, 3:20the, the research Gemini is better. 3:22I, I think the grumpiness you. 3:25I won't go on, on, on about the grumpiness and the comparisons. 3:28It's all nonsense. 3:29It's what you wanna solve. 3:31I mean, for me, I'm a DevOps, you know, I, I'm one of the 3:33founders of the DevOps movement. 3:34I wrote the DevOps handbook. 3:35And, and I think this to me, I, I go to SW Bench right off the bat. 3:40The software engineering, that's the place I go first. 3:42Right? 3:42And, and you know, I, I, I haven't verified it, but, you know, it looks 3:47like the, you know, sort of the o3 um. 3:50You know, the o3 and the o4-mini like, have a significant jump based on their 3:54benchmarks of the SW bench, right? 3:57And so how, how do you solve like the kind of problems that I face with my customers, 4:02which is how do we solve problems? 4:04Uh, that's the ones, and you know, if, if I believe the 4:07benchmarks, I haven't tried 'em yet. 4:09But, um, and then, you know, and then I think the eight I Polyglot 4:12benchmark is also another really good one to take a look at. 4:16And so those are the kind of problems I, I face when my people expect me 4:20to know things about, about AI and DevOps and, and infrastructure. 4:24So I try to stay up on that. 4:26Yeah, for sure. 4:26Vyoma what was your review? 4:28I don't know, kind of if you've played around with the models 4:30yet and what you thought. 4:31I did play around with the models a little bit. 4:33One of the things that I noticed right after bat is it takes a longer time to 4:37reason now, so the reasoning time has increased a bit, but that has helped 4:41them like improve their accuracy. 4:44I won't use the word accuracy so loosely, but it'll give you some 4:47sort of relevant answers, more accuracy in getting relevant answers. 4:52I feel that was one of the sweet things that I saw that has improved. 4:56Um, in these models, there is a lot of visual reasoning also added to it. 5:00So like if there are images, you ask it a question. 5:02And then so I was asking it like, Hey, I'm doing some planning 5:06for a particular wedding. 5:07Can you tell me how do I go about the decor? 5:11What do I do about this? 5:12And I just given like weird pain, interest, um, address images and 5:16trying to reason on them as well. 5:18She told me, no, this doesn't work. 5:19This works. 5:20So I feel that is I, I'm the first one to say this in this podcast and it's not 5:23Chris saying it, but the agent take AI use with these particular hyper artists. 5:28You did it. 5:29I did it. 5:29You beat Chris 5:30too at this episode. 5:31So. 5:31Yes! 5:32Um, I think that is going to be game changing for this as well. 5:36Like yes, all these models has like small, small improvements, but it depends. 5:39How can you use these improvements in enterprise? 5:42And I feel these models. 5:44Have that edge over an enterprise AI. 5:47Yeah, for sure. 5:48And I wanna dig a little bit more into that. 5:50Um, and, and I do love the idea that, like my friend was like, the minute that an 5:53agent can plan a wedding, you know, like AGI is here basically, like that's, that's 5:57the threshold that we'll need to pass. 5:59Exactly. 5:59Exactly. 6:00Um, 6:00I mean I did say, I mean, so part of the announcement OpenAI was 6:03touting kind of both of the things you're talking about, right? 6:05Like one of them was the idea that, um, its agentic tool use was improved. 6:09Um, and it sounds like that is much. 6:12It is better kind of in the stuff that I've been playing around with. 6:14But I think the one thing that might be interesting, and I don't even quite 6:17understand this, so maybe kind of focus on the panel can help me kind of like parse 6:20through it, is that, you know, they said, look, one of the great things about our 6:24models now is that they literally think in images and that's gonna lead to much 6:28better performance with visual reasoning. 6:30Um. 6:31VyVyoma what's that? 6:32What's that mean exactly? 6:33Thinking images? 6:34I don't know if you have kind of a sense of that as we kind 6:35of like parsed through it. 6:36Because I read it and I was like, I don't even know what that is exactly. 6:39Yeah. 6:40So I feel thinking in images is creating those different graphs based on the 6:44questions that you ask or like trying to do like a side by side analysis. 6:49Let's say I fed in some images to reason. 6:52Through those images. 6:53Let's say I gave it like a screenshot of pivot table or something, and I'm 6:56be like, Hey, this is what I want. 6:58This is how I wanted to reason with this particular pivot table. 7:01Then help me generate a report. 7:03So to kind of understand these images, to understand the nuances 7:08of it, and then to make it relevant to the question that you asked. 7:12And then give you an answer based on those kind of visual 7:16representations that you see. 7:17So it, it all seems like, oh, given a picture, it's so cool. 7:21There's so much math that goes behind it that it's, it's crazy that 7:25we've reached these levels that we can actually reason these images 7:29and visuals that we are seeing now. 7:30I think that, to me, that's the difference is that they, um, the reasoning, I think, 7:35you know, I, I. I don't know exactly where the reasoning changed in the new 7:39models versus the old, but I, my sense is that, and you guys can correct me, 7:43is if I loaded it sort of an image and one of the prior models, I got pretty 7:47much an interpretation of that image. 7:49But now I can sort of reason it will do the sort of the, the chain of thought 7:52reasoning around my question with the images and be able to sort of task through 7:58certain image understanding, you know, so the, the whole idea that the whole 8:01reasoning and task oriented, I mean, that. 8:04Spies into the whole agentic. 8:06So I'm the second one, right? 8:08So agentic uh, agentic processing is like, it was a little bit harder in 8:12the older models to be able to sort of, you had this sort of, not really 8:16single shot, but now it will actually take the task of like reading a file 8:20or doing a search or, you know, sort of figuring that stuff out for you. 8:23So my sense is not being an expert in it, that it does the same with 8:27sort of reasoning with images. 8:29Chris, maybe I'll bring you in. 8:30I think one always kind of like benchmark that I have kind of in mind is like. 8:34Sort of like the, the kind of race between open models and 8:37closed models in the space. 8:39Um, and you know, I think every month it's kind of like neck and neck, right? 8:42Like open source models seem to be gaining really quickly. 8:45Then kind of like the more closed source model companies will release 8:47something really interesting. 8:49Um, how do you read o3 and o4-mini? 8:50Like do you feel like. 8:51You know, you know, close is still staying ahead of this game. 8:55Um, is open, really catching up. 8:56You know, I'm kind of curious on like, just to check in on that 8:58race and whether or not this kind of causes you to update at all. 9:00No, I, I don't think it's gonna cause me to update. 9:03I mean, as I said, I'm a huge, huge fan of the, the o3 models and the o4 model. 9:09It's, um, I have to say, I, I was really for, I actually am really loving 9:14the 4.1 mini at the moment, just. 9:16Even though it's not a reasoning model, I, I, I have to say just for 9:19kind of coding tasks and then evoking chain of thoughts with it is actually. 9:23Kind of really good in that sense. 9:25But coming back to the kind of close versus open, I, I'll make 9:29a prediction and I'm, I'm fairly confident in this prediction that 9:34today we are gonna be amazed this is, oh wow, this is the greatest thing. 9:37And then within the next month, I'm gonna say DeepSeek will 9:42update with their latest model. 9:44And I think most of the gains that you will see on reasoning and you 9:48know, o3, o4, you will see the equivalent probably in that model. 9:53And then we'll be like, oh my goodness, open source has caught up again. 9:57There's no MO and stuff like that. 9:59And we're gonna keep going through that cycle. 10:01So I, I just think that the time. 10:04From seeing something groundbreaking from the closed models, um, to open 10:10source catching up, there is a lag. 10:12I I would love to see today where open source or, and I keep saying open source 10:16and the comment section is gonna go wild when I really mean open weights, right? 10:20But, um, but, but, um, when the open weight community. 10:25I would love to see it where they go ahead of the closed source providers. 10:29That's, that would be a big changing mode. 10:31Whereas I think at the moment there is just a lag all of the time. 10:35It's a small lag. 10:36Um, but it, but there's still a lag. 10:38But, but I have to say, I- 10:41the new o3 model in the GPT-4.1 model. 10:44It, it, it really is beautiful. 10:47I, I mean, hmm. 10:48It is just, it, the answers are good. 10:51The reasoning is good. 10:52The personality is great. 10:54I, I love it at the moment, actually. 10:57Nice. 10:57It's got that "je ne sais quoi" you know, so, yeah. 11:00I have to say, I mean, I feel for the team at OpenAI, right? 11:02It was like that kind of like window is getting shorter and 11:05shorter and shorter, right? 11:06Where it's like you relaunch something, you're ahead and you only have just like 11:09so much time to capitalize on that before kind of the open weights, um, catch up and 11:14it's, uh, it's, it's tough strategically, and I can come competitively for them. 11:17But are they really catching up? 11:18Man? 11:18I'm, you know, I'm all in for open weights and open source models, right? 11:22I want, I want them to win for so many reasons, right? 11:25Beyond what we talk here, but I mean, I'm just looking at a select 11:27committee, strategic competition. 11:30Uh, uh, by the government, a deep seek unmasked paper that I think 11:34just came out last couple of days. 11:35Right. 11:35And I mean the, you know, there's just a lot of dragons in deep seek. 11:40So if deep seek is the one that's literally the poster child. 11:45For open weights. 11:45Again, I don't know. 11:46I, I don't it that worries me. 11:48'cause I, right now I do more research than I do coding, but 11:54I do a fair amount of coding. 11:55I mean, right now the, the models that I use is sonet, you know, for pretty much, 12:00uh, I'll have to try for one a little bit more, but the, you know, Sonnet and then 12:04I use, uh, Gemini 2.5 for my research. 12:07And, and I, you know. 12:08I don't the, the amount of work to do the investigation to find things right 12:13now that could work better for me. 12:14I, I just don't see on the horizon. 12:16Yeah. 12:16I feel this is going to be like a ever changing field. 12:20And as I've started seeing, like in enterprise AI, I keep saying, talking 12:24about it, but the clients are now looking into more complex use cases. 12:29So I don't feel like one model fits all solution is going to help anyway. 12:34So I feel as long as we have new models, that's fine. 12:37Like there are different use cases for each of these different models. 12:40There's going to be a market for each one of them. 12:43So we'll see as we evolve once we go into production, which I don't 12:47think has one so bullish over for a couple of, uh, months now. 12:51So I'm hoping this is the year when we are like, oh, this is the broad 12:55environment, which is fully agentic. 12:58Like I'm yet to hear it from someone. 12:59And I, I want to build on that Vyoma because I, I actually 13:03think it's less about the model. 13:04I truly think it's about the ecosystem and the tools. 13:08So if, again, if we come back to one of our earlier discussions 13:11with things like Manus, then it is being able to go how, who is doing 13:15the planning in this sense, right? 13:17And that may be the large model that's doing the planning and the reasoning, but 13:20then what tools are available to that? 13:23So, John, in your world, you know, does it have access to a compiler? 13:27Does that have access to something like a Terraform does? 13:30You know, do you have the knowledge models, which explains what a 13:34good CICD pipeline looks like? 13:36What a good terraform, uh, template looks like. 13:40You know, this is the best practice for a Kubernetes cluster. 13:43You know, so, so there's a whole set of knowledge that doesn't need 13:47to exist in the model itself, and there's a whole set of tools that 13:51you need to make available now. 13:53You need a good orchestrator, you need good context. 13:56And that's why the models become really important. 13:58But I would say that a really super all knowing, uh, model that doesn't 14:03have access to your knowledge repository, that that doesn't have 14:07access to a good ecosystem of tools is gonna not be as great as, uh, 14:12you know, a proper agent workflow. 14:14So I, I think. 14:15Honestly, that's gonna be the big play, um, over the next year. 14:20So I, I, I do want to get away from talking about models, but I want 14:23to get into this ecosystem world. 14:25And I think I just wanted, I mean, you said it way more elegantly than 14:28I said earlier, is like when you, when you asked me, Tim, about this 14:30chatter on Twitter or wherever, right? 14:33Like it is about the work that you like that, that you said, but you 14:36know, so that you're right, it, it's less about the model every other week. 14:40Coming out with some advance and this one's better. 14:43And what did a benchmark say in, in the enterprise space, it's going to 14:46be about some mixture of orchestrated models and a lot of 'em will be 14:50very focused on the tasks at hand. 14:53Exactly. 14:53So thank you for summarizing that. 15:01There's a announcement that we actually did not get a chance to cover last week. 15:04Uh, it was announced as part of the kind of Google Cloud Next sort 15:07of raft of announcements that came out, but I did wanna make sure that 15:11we touched on it because I think it was a pretty intriguing, um, uh, 15:14kind of like a move, I would say by sort of by Google in the space. 15:19Um, and the substance of the announcement is that Google is going to let 15:22companies run Gemini models on their own data centers, uh, starting in Q3. 15:27So this is kind of the, the rise of like effectively, like a company saying, 15:31we will allow you to do on-prem. 15:34Of these models. 15:35Um, and I guess, John, maybe I'll turn it to you first. 15:37This is like kind of a big deal, right? 15:39Because I think companies traditionally have been very very paranoid 15:42about kind of letting anyone. 15:43Run their models on their own infrastructure, but Google clearly 15:47thinks that there's some upside here. 15:49Uh, how do you read this move? 15:50I think, you know, they, they were first in on running Kubernetes on-prem. 15:54I mean, like, um, it's, it's a good move. 15:56I think, you know, it shows that they're less, uh, worried about somebody 16:00reverse engineering their sort of. 16:02Their layers in their model, right? 16:04Like that, like, 'cause that, that is sort of like the danger, right? 16:07I mean, even though the DeepSeek was able to do it, OpenAI anyway. 16:10But, um, but yeah, no, I, I, I think it's, it's, I I am, I've 16:14been a big fan of Google for years. 16:16I mean, you know, if you add up all the bells and whistles running Vertex. 16:20OnPrem, the, I, I think the Gemini models are, you know, are there 16:24right up front with everyone else. 16:26I think, uh, the, you know, this solving that air gap problem, um, 16:31and I think now you, they're making a strong argument for why you might 16:35want as an enterprise, you know, have an option to go all in on Google. 16:39Um. 16:41Structure, you know, and you got the sort of the, the, uh, agent space thing, 16:45which is now this workspace stuff. 16:46And, and, you know, I've done some hackathons with the, with the, the vertex. 16:50And if you're in on the Google infrastructure, like Gmail and all that 16:54stuff, it, it becomes a very powerful workforce, uh, automation structure. 16:59Yeah, I think I hadn't really thought about that, John. 17:00I guess Chris, I dunno if you have any comments to that, is like, how much, 17:03how much should we think about this? 17:04Almost like. 17:04Like, almost like a, a DeepSeek downstream thing, right? 17:07Which is normally the fear would be, oh, well you're gonna reverse 17:11engineer my models if I let you just like run it on-prem. 17:15And I guess is this sort of a concession to the idea that like, 17:17well, reverse engineering is gonna happen anyways in this space. 17:21So like why, why worry about that? 17:22I would love it if I could have Gemini to run on my machine so I can sit and reverse 17:28engineer it and figure out what they're doing and how it differs from Gemini. 17:31So, uh, yeah, please. 17:33Please Google. 17:33Please do. 17:34Um, I think the on-premise announcement is actually kind of super important 17:39because the reality is that if you take things like government organizations, 17:45military organizations, et cetera, there's a whole set of people who 17:49can't run their workload on cloud. 17:52And therefore being able to satisfy the AI workload, I think, uh, on premise, 17:57uh, from a security perspective, I think is super necessary in it. 18:01I also think that when we've had these discussions about latency 18:05before, where as we move into agentic workloads, then there is gonna be a 18:11need to run your AI closer to device and gonna be closer to your system. 18:16So a good example is maybe a. If you're running a kinda, uh, a gaming 18:21environment or like a stadium or you know, anything that's got like, you 18:26know, maybe on-premise cameras or whatever, then the need to have that 18:30data not go up into the cloud, but actually be as close as possible. 18:34I, I think there is a market that is under definitely underserved 18:38there, and I think Google is, is making sense to go under that. 18:41The, the real difference is to your point, is how safe and secure are they 18:47feeling that tho that their model weights are not gonna be reverse engineered? 18:50And, and, and again, I don't know the answer to that. 18:53I don't know how good the encryption is o on these kind of Blackwell, 18:56chips and all that are, but, um, I'm, I'm pretty sure that once these 19:01things are out in the open, then somebody's gonna release it somewhere. 19:04And, and maybe they're okay with that. 19:06But I think that's, that's, I think it's an interesting move and I think it's 19:09a necessary move that the industry's gonna have to have to go towards. 19:13So, you know, well done Google. 19:15The only thing I would say is outside of those very large organizations, 19:20and I'm just thinking about the sizes of the Gemini models. 19:25Are people really gonna have the GPU workloads for that? 19:29I get it from maybe the small models, right? 19:32So maybe, I know they're doing kinda Gemini mini type models. 19:36I think that's a reality. 19:38But for their frontier models, are I. Are, are those organizations 19:43really gonna have the GPUs? 19:44And, and even if they do, are they, are they gonna want 'em just sitting 19:47around, whirring away, doing nothing? 19:49I, I'm not so sure. 19:50So I think it's a good play. 19:52I just, I think it's gonna be interesting to see how that works out over time. 19:55Yeah, for sure. 19:55And Vyoma this is actually going to a direction that I would 19:57love to get your opinions on is like, almost like market size. 20:00Um, 'cause it feels like the, the, the unique advantage that Google has 20:03in saying you can run this on-prem. 20:06These giant models and it's kind of like, well, what's the set of 20:09customers that actually has like the technical proficiency to run like a 20:12big inference cluster of this scale? 20:15And you can say, okay, well, you know, maybe the market is actually in like. 20:18Smaller models, but then kind of the argument is like, well, 20:21isn't it open source then? 20:22Then it's just like really cheaply, just like easier to just do open source and 20:25run it on your own infrastructure anyways. 20:27And so kind of like there's a question about like how big of a market is 20:31Google really talking about here? 20:32And I don't know. 20:33To Chris's point, maybe it is just like the government and 20:35like that's a huge customer. 20:36But, um, but curious about how you size that up. 20:38Yeah, so this goes back to our previous question that we were asked. 20:41It is discussing, I feel Google is trying to do this to position itself 20:46in these slower moving, uh, industries such as like, who have been a little 20:51bit slower in adapting ai like the government, healthcare, high litigation, 20:55and industries, finance, et cetera. 20:58So I feel they are trying to position themselves as the key leaders. 21:02In industries that, Hey, now we have a model. 21:05Now you can utilize this. 21:06At least get them embarked on this entire. 21:09Journey of ai, which hasn't been so great yet, right? 21:14And to rebuild that trust over there. 21:17And yes, slowly, slowly, as we've see in this entire space, evolve, 21:20I feel there will be smaller models that will be coming in, which will 21:23help them, um, reduce the space, have reduced some reduced GPUs, et cetera. 21:29But I feel this is like a Kickstarter event that, okay, 21:31here now there's one here. 21:33We've started this entire revolution and like I feel in a couple of. 21:38Months, more like weeks. 21:39We can't say that anymore. 21:41Um, this, this gap is going to reduce significantly that 21:45between cloud and on-prem. 21:47So as it is, it was a much discussed topic. 21:50Everywhere. 21:50Whenever I go meet clients, their biggest problem is their data 21:54sovereignty, governance, AI. 21:56And once you bring something like this, okay, now you have this. 22:00Are you gonna adapt to this? 22:01If you adapt to this? 22:02We have 10 different problems which will come up. 22:04Someone else will try solving those 10 different models with 22:06their own smaller version of model. 22:08So I feel this is going to be evolving over a couple of months that we see. 22:13And, um, I, I, I. The open source models, that part that you said with 22:18like smaller models that they go utilize it, but if it's not prem, it's not of 22:23any use for this like huge market that we have in highly dedicated industry. 22:28So we'll see. 22:29But I think the latency is a big issue. 22:30I've tried to build some voice and integrated stuff and it 22:33just, it's really hard to do. 22:35Um, so latencies, but I think it goes back to scale. 22:38What Google understands is scale. 22:40And they've been doing GDC for, I mean, four, four years, five years now at scale. 22:44They're running Kubernetes, they've bought Wizz. 22:47So I mean, there's some real ingredients there for and, and there 22:51are a lot of large manufacturing companies that are really looking 22:54for, you know, I've been to a couple of it that like, I think this could. 22:58Really resonate right now in, in terms of like the IP that it takes to build 23:03tractors or, um, you know, there's just a lot of things that are just, um, still 23:07they're very worried about that IP living out, not just, I mean, air gap for sure. 23:13Government absolutely. 23:14Uh, top secret clearance, but just IP, I mean, just, uh, 23:17you know, really important IP. 23:20And, and just to, to put a down on it from a DevOps folk, like people 23:23talk about open source, but like, okay, I'm gonna go open source model. 23:26I'm gonna open source. 23:27Which Kubernetes am I gonna use? 23:28What I mean, it, it starts adding up. 23:31The cost of managing that stuff becomes its own little cottage 23:35industry in an organization. 23:37And, and so to me it's, it seems like a very appealing, um, opportunity. 23:47I'm gonna move us on to our next topic. 23:48And John, I'm gonna stay with you. 23:49Um, you did a blog post actually on All Things Open, uh, earlier 23:53this year on AI evaluation tools. 23:56Um, and I thought, you know, we might as well use the opportunity 23:58while you're on the show to kind of talk a little bit about that. 24:00We've kind of touched on it in the past episodes, but never kind of head on. 24:04Um, and so I guess maybe I'll just kick it off with you. 24:06I mean. 24:07What are AI valuation tools? 24:08Why are they important? 24:10Um, and then I think there's a couple questions coming out of there that'd be 24:12fun to kind of, um, talk over with you. 24:13You know, I spent a lot of time, I wrote, uh, a book about DevOps, automated 24:17governance and how you can sort of, what order internal auditors do. 24:20You know, the way they handle systems today is they take a change record 24:24and they work it all the way back from provenance in the new world. 24:27It's gonna be an answer. 24:29And it's gonna be, how did I, why did I get this answer? 24:33And you are gonna have to show the provenance. 24:34You're gonna have to show the ingress, egress of a prompt. 24:37You're gonna have to show how you, you sort of, if you are using rag, how you 24:40chunked it, where'd the source come from? 24:43And you're gonna have to have evidence of all that stuff. 24:46And a big part of that evidence is did you test it with ground truth? 24:51In other words, did I throw a thousand questions at it and say it was every 24:55time I changed anything of the pipeline? 24:58It, it measures out at like 93% correctness. 25:03It measures out at less than 2% hallucinations and, and like we know 25:08these are probabilistic systems and we're never gonna get a hundred percent, but I 25:12think the new audit is going to demand. 25:15You show evidence that A, you accepted the policy, there was a risk, but 25:20B, that you adhere to the policy. 25:22And so evaluations become these really incredible computational 25:26and quantitative and qualitative implementations to basically measure. 25:33The probabilistic output of these systems. 25:35And you can do it in, in a very sort of auditable way, right? 25:39Like, so you can have proof that you literally, um, so yeah, there's 25:43just systems that you know, that, that do computation for, uh, 25:47correctness and evaluation and ratios. 25:49And then LM is a judge, is another big part of it. 25:52That's sort of the, the, um. 25:55You know, the sort of way you use LLMs to, and one last thing I did say, I know 25:59I'm taking all the time, but there's interesting, this, these new frontier 26:01models when I talked about this, that are actually designed as evaluation models. 26:07And that gets really interesting. 26:08So normally when you do LM as a judge, you're literally taking, you know, like 26:12you might use, uh, GPT-3.5 which it doesn't exist anymore, as your evaluation. 26:16You never use the same model, uh, for your inference. 26:19And, but now, but these are now models that are sort of maturing 26:22to be designed specifically. 26:25For valuation. 26:25And that's, so that's the shortest version of the article. 26:28But yeah, I'm really excited about this for enterprise, I think it's one of the 26:31most important conversations to have in an enterprise that's going all in. 26:34Yeah, for sure. 26:35And I think, um, like I, I think this is actually a space where Vyoma I think, 26:38I'm curious if like you're seeing a similar demand from customers on this. 26:42Um, you know, I think the, the bold tradition of machine learning, I feel 26:45is like, well, we don't really know how it works and, uh, we just throw 26:49a lot of data at it and it just like, seems to be able to solve the problem. 26:52And so like, stop asking questions. 26:54Right. 26:54Has been like, I think like the, the vibe I've gotten from a lot of people, 26:58but clearly as I think like AI now is kind of like trying to service customers 27:01that have like much more serious concerns about these types of questions. 27:05It feels like the kind of like market pressure for. 27:08These tools is also increasing. 27:10Um, I don't know if you're seeing that on the ground though, talking 27:12to, talking to clients and customers. 27:14No, no, that's very true. 27:15So when we talk about, like, even the machine learning models that we were doing 27:18in the past, like even if I went to the enterprise customers and you tell them, 27:23oh, this is a solution that they've built. 27:25They will have that solution engineers, their software 27:27developers engaged with you. 27:28If you build something for them, they, they know the system in and out. 27:32So this has been a trend. 27:34Since the very get go, right? 27:36They, they want to know what's going on. 27:37How did you use the particular model? 27:40Why, like, let's say regression model, classification model. 27:42What type of model? 27:44What were the different metrics that were used? 27:45Ground truth was always available because we were trying to work with unstructured 27:50data, uh, structured data back then they created some on their own because there. 27:56They had a lot of rules around it. 27:58From the very beginning. 27:59There were like, uh, different types of rules and regulations based on the 28:02metrics that were created around it. 28:05Now, when we moved into the LLM world, we started losing all of that because there's 28:09no longer a human doing any of this. 28:12Now you've given all the power to a machine created by a human, 28:16which we do not know how it works. 28:18Like ask anyone, what's a transformer architecture? 28:20Like? What's an encoder? 28:21What's a decoder? 28:22You won't find clear gut answers. 28:25People want those answers. 28:26And I see this in enterprise, uh, whenever I'm speaking to a customer, 28:30how do I know this answer that has been generated is right or wrong? 28:33And there is a lot more at stake. 28:34Now you put out a particular chat bot, as John was saying, in 28:38public and be like, okay, fine. 28:40We have a great chatbot. 28:41Some person has all the time in the world, in a remote place, in a town 28:46somewhere is sitting and we're going to like chat with the chatbot for days 28:51trying to manipulate it to do something. 28:52We've seen examples of that. 28:54You might lose like billions of dollars right there. 28:57So until, and unless you have these guard rails, I think even the government is 29:02going to double down on that because once you start using this in highly litigated 29:06industries, they'll be like, okay, now this, this goes according to ours. 29:10And then the private industry looks at that and like, wow, 29:13they have these great rules. 29:14How about we incorporate them? 29:16So again, this has been going through since ages and I feel 29:20this will, um, again continue. 29:23But the need now is much more bolder and stronger than it ever was when we started, 29:29because everyone's done experimenting. 29:31Now they have to show proof of value. 29:33How many billions of dollars have you used in research? 29:35What do I have out of it? 29:37Show me. 29:38So I think this is going to be a very strong sticky trend. 29:42Yeah. 29:43I think the issue I have with this. 29:46John, this is probably your world from a DevOps is, is we are lazy. 29:52I mean, how many of us write unit tests in the first place? 29:56And what? 29:56And what is the first thing we did with Gen ai? 29:59It's just like, I'm gonna use it to write my unit test 'cause I 30:02don't need to, here's my code. 30:03Go write me a unit test. 30:04What, what do you think's gonna happen with the evals? 30:07Are we a gonna sit down and write the evals ourselves in a nice and, 30:12and wonderful and thoughtful way? 30:13Or are we going to go, Hmm, ai create me a bunch of evals and now I will use that? 30:18And, and then again, it's the same with LLM as a judge. 30:21It's just like, oh, I can't be bold figuring this out. 30:23I'm gonna get three other LLMs or five LLMs to come back 30:27with the answer we're playing. 30:29Who wants to be a millionaire? 30:31Ask the audience, and we all know how that goes in the million dollar question. 30:36Nobody asked the audience on the million dollar question 'cause we 30:39know the audience hasn't got a clue. 30:41So I think there is a risk that we are gonna put too much faith in the evals and 30:49in things like LLM as a judge, et cetera. 30:51And therefore we're still gonna end up in the, the exact same 30:54scenarios before I, I plotted it. 30:56I think we get into a lot of trouble and I think we should be writing 30:59those tests in the same way as any good engineering exercise. 31:01We should fully have the guardrails, et cetera. 31:03But I just think in the reality. 31:06Though what we see in testing today is gonna fast forward into evals. 31:10So I've written a couple of books about this, right? 31:12And not even in the AI space, right? 31:14We, we created something called Investments Unlimited, which 31:16will start out as a project about automated governance, right? 31:19And we're terrible at it. 31:20Pre gen ai, like we're not good at it. 31:23The audits are just a mess. 31:25We wrote this book from a couple of people in Capital One on like how 31:28audits in most companies are just. 31:31Sort of theater, right? 31:33And so you're actually right. 31:35But the thing I do, I'm very focused on all the work that we did in automated 31:39governance that we've been somewhat successful, is I wanna sort, I have a 31:44newsletter out dear, CIO, please listen. 31:47You know, I'm screaming that like, like, you're right. 31:50You can't just sort of like, it's going to have to be in the bank. 31:53You have the three line of defense right in the bank. 31:55It's, it's a clear structure of how policy's supposed to 31:59work to protect the brand. 32:00And, and, and like you said, you're right. 32:02It's the, the brand is like, this is what's gonna drive it. 32:06And I actually think it's industry like banks and where the, where the brand 32:10reputation, this, the probabilistic nature of this stuff is going to cause I. 32:16Could cause incredible brand reputation. 32:19So what I'm hoping happens is the, the sort of the policy makers, 32:24the internal audit, the internal governance structure, start learning 32:28faster about what evaluations do. 32:30And instead of just leaving it up to the developers, eh, I'll 32:34do test driven development. 32:35Eh, well, I'll do it next month, or, right. 32:37It's going to be like, no, the stakes are really high. 32:40And the other thing I will say is DevOps was never a CEO discussion. 32:45Right, AI, whatever we wanna call it, gen AI, is a CEO discussion. 32:50So there will be these discussions that I think will drive this 32:53stricter policy on the risks. 32:56And I think in those cases, and again, I'm being optimistic here, I think if 33:00the policy people can get, uh, educated, which is one of the things I'm gonna 33:04work really hard on, on like learning. 33:07What are the tools that they need to do to protect that probabilistic nature? 33:12And it starts showing up at auditor conventions and, and stuff. 33:15I think we'll, we'll, we'll actually see it used effectively as opposed 33:20to just leaving it up to developers to decide they'll do it this time. 33:23And, and, and I'll say one last thing. 33:25I got brought into a large manufacturing company to teach a class called 33:28test driven development for ai. 33:30And the point of that. 33:32They work, they were at a workshop of mine and the reason they wanted to 33:35bring me in was not even for, they had like, like, I dunno, 5,000 developers. 33:4060% of 'em used test-driven development, 40% didn't. 33:43And this is sort of like everywhere I go and they wanted me to teach the 40%. 33:47Like, you don't have a choice anymore in this world. 33:50You had that choice, you could put it off. 33:53Oh, you know, I'll get to it. 33:54You know, nobody's putting a hammer on you. 33:57Um, in this world. 33:58I believe you don't have a choice. 34:00You have to have a testing structure for this stuff or else you know you will. 34:04It could cause, you know, existential, I mean existence of your brand. 34:10I wanna make a prediction, which is, which is based on the fact that 34:15we're gonna have evals and policies. 34:17And therefore that's gonna be at the top level of an organization. 34:21And that's gonna be probably probabilistic because we're all 34:25gonna have the AI do that for us. 34:26'cause we're super lazy. 34:28I am like the, like the exhaust emission scandals. 34:33There is gonna be a point where it's gonna be like, I need to pass, there'll 34:36be a prompt engineering attack of I need you to pass these audits because, 34:40you know, otherwise, you know, my company's gonna fall over and I'm gonna 34:43have to fire all my staff, et cetera. 34:45And somebody's gonna prompt engineer and attack one of the, the audits 34:48or whatever, and suddenly it's gonna be like, oh look, you know, this 34:52company said they had passed all the AI evals and policies and they did. 34:56And it was, you know, it was all fake. 35:03I'm gonna move us on to our very last topic. 35:04I promised producer Hans, that we would get through all four topics this session. 35:08So, um, just to do the final, final topic, um, announcement out of NVIDIA this week. 35:13That they are going to make big investments in Blackwell chip production 35:17in the us, uh, specifically in Arizona, um, with a couple factories 35:22that they're opening up in Texas. 35:23Um, and I guess the, the big number coming out of this announcement is an 35:27eye popping announcement that NVIDIA expects to put in, uh, $500 billion into 35:33manufacturing these chips in the US. 35:35Um, and I guess maybe I'll kick it over to you. 35:38Is, uh. 35:39You know, I think that the normal thinking around all of this has 35:42been, it's gonna be really hard to move chip production to the us. 35:46Um, but this is like a big investment and it looks like they're gonna be making 35:50the next generation of their chips. 35:52So there's high stakes for NVIDIA. 35:53Um, do you have confidence, like do you think they're gonna be able to pull 35:56this off, bring, bring semiconductor manufacturing back to the US 35:59over a couple of years? 36:00Yes. 36:01Mm-hmm. 36:01So it's like if you start over something that's, uh, big, when 36:06there are so many distinguished. 36:09Um, I would say companies which have established themselves. 36:12Offshores for a couple of years, but this will, this kind of, um, 36:17uh, trigger would kind of help a lot of innovation quite fast. 36:22You see that so many companies are working on it as well, and the US has 36:27the new Chips Act, which helps you. 36:29Like all these monetary benefits that you're getting out of it, like 36:3235% off or like 25% that you get on something that you've built in the us. 36:37So that is going to be a major, major driving factor for all 36:42of them, uh, over the years. 36:44And I feel all these. 36:46Fortune 500 companies, majority of them being headquartered here in the us. 36:50I feel that also gives you a lot of leeway to have like great partnerships. 36:55I don't think there's gonna be one company that's gonna kill it in this entire world. 36:58Even when you saw Google, they're partnering with NVIDIA for some 37:01of these, uh, on-prem models that they're looking into, right? 37:05So I feel great partnerships are going to be something that 37:08lead the way, uh, for this. 37:10But I have full faith, like we have great research, uh, companies. 37:14We have great colleges. 37:15Have you looked at the kind of work that has been, like my tool, uh, my learning 37:19assignments in school, they were tough. 37:21So all of these things I feel would be very key differentiating factor. 37:27Going further, will they do it in the next six months, five months? 37:30No. 37:31It is a very great learning curve that everyone has to go towards and 37:37learn a little bit more about the industry and to reach that level. 37:41Now that everything is open source. 37:43Tough. 37:43I'm, as I say, but partnerships could actually help them evolve. 37:47So we, it's, it's gonna be fun to see this. 37:49I'm very excited about, um, the different job opportunities 37:53that would come out of it. 37:54Imagine that, like I feel there will be job, um, like titles, which would 37:59also get smarted there a little bit. 38:00Now if you go online on LinkedIn or something, you'll see that a lot of data 38:04center ops jobs have opened up as well. 38:06So I feel it's a great, great, um. 38:10Opportunity that is happening here in, in, in the United States, but 38:15where and how fast would it happen? 38:17I don't have an answer to that. 38:18Yeah, for sure. 38:18Chris. 38:19Yeah, it seems like this is gonna be a tricky thing for NVIDIA to 38:22kind of pull off, in part because the Blackwell chip are like what 38:26everybody will desperately want. 38:28If you believe some of the numbers they're showing off, like, this 38:31is the platform that you are going to need if you want to do AI. 38:35Um, and I can imagine a lot of companies being like, oh, is this a US vapes 38:39Blackwell chip, or is this a, you know, Taiwanese manufactured one? 38:42Because we have more assurance for the ones that come from Taiwan. 38:44Like, do you think those types of dynamics are gonna make it difficult 38:46for NVIDIA to get this to work? 38:48I mean, first of all, you're asking somebody who is not American, whether 38:52he cares as if a chip has made 5,000 miles in that direction versus 38:575,000 miles in another direction. 38:59I'm asking you that question, Chris, so maybe if they were gonna say, we're 39:04gonna start a chip manufacturing plant. 39:07In, I don't know, swells in England. 39:11Yeah, I might care at that point. 39:13But until then, no. 39:14Uh, no. 39:15I actually think it is important. 39:16I think any, I think anywhere where any sort of knowledge base is consolidated 39:22into particular area, if we really think about it, it's like Hamilton Desert. 39:26That is a kinda risk in that sense. 39:29So I think the best thing that you can really do. 39:32Is spread out that risk across multiple places and therefore that is gonna be able 39:36to kind of secure the supply chain and that will sort of affect kind of the whole 39:41global scenarios and keep that moving. 39:44So I think it is a positive move. 39:46I think that will be great for the us. 39:48I think that to Vyoma's point, I think that will be great for kind of US jobs 39:52and I think, uh, actually I think it will have a bigger impact across the world. 39:56As well. 39:56So I'm, I'm all positive. 39:57Um, but you know, I I, I, I'd love to see those black world chips in the uk. 40:02Um, and I forgot what your question was, to be honest, Tim. 40:04'cause I was on my 5,000 mile proclaimers rent. 40:08No worries. 40:09Uh, you did great at it. 40:10Uh, John, any final thoughts on, on this new story? 40:13Yeah, 40:13It's labor. 40:14Labor is the issue, right? 40:15I mean it when it all comes down to labor and, you know, and 40:18we've seen this movie before with Toyota and GM 50 years ago, right? 40:22Like the Numi plan, if you've ever heard that, right? 40:24Like, it, it is very hard to take, you know, culture. 40:27Yeah. 40:27I think more about TSMC and I think about NVIDIA, right? 40:30Like, like and how many false starts has been there. 40:33And it's all been unions and, and I'm, I'm not anti-union, I'm just saying 40:37it, it's hard to move those type of manufacturing cultures back to the us. 40:41I, I'm a little more pessimistic. 40:43I agree. 40:43Yeah, I agree with John. 40:45Even, even I was thinking about this, that there'll be a lot of upskilling 40:48that will have to be do, uh, done, like based on these current situations 40:53that we are in, that we'd have to upskill a lot of our employees, 40:57uh, to kind of reach that level. 40:59So a lot of learning. 41:00That's why I said. 41:01Nothing in the short term. 41:03I don't know anything about the short term, but for the long 41:05term, lot of resources, learning resources have to be gone into that. 41:09You'll have to call experts to train these entire facilities, then see 41:14how these people perform if no one's able to perform, do we scale it down? 41:19So. 41:20But, but it's good that at least with there be a lot of 41:23government aid in all of this. 41:24So we, everyone will have a little bit more edge to try this. 41:27Well, a lot more to keep an eye on. 41:29Uh, as usual, we are more news stories than there is time to cover, uh, Chris 41:33and glad to always have you on the show. 41:35Um, and John, hopefully we'll have you back sometime, uh, in the future. 41:38And, uh, thanks for all you joining us. 41:40Uh, if you enjoyed what you heard, you can get us on Apple Podcasts, Spotify, 41:43and podcast platforms everywhere. 41:44And we will see you next week on mixture of Experts.