Learning Library

← Back to Library

Claude 4.5 Opus: Efficient AI Model

41m • Unknown Channel • ai-ml • interview • intermediate • Watch on YouTube ↗

Key Points

The host frames the AI landscape as an “infinite game,” emphasizing a shift toward a creator‑centric ecosystem that can break the dominance of large Web 2 companies.
“Mixture of Experts” brings together top AI thinkers—including IBM engineers and executives—to discuss broader strategic themes rather than just headline news.
The episode’s focus is Anthropic’s newly released Claude 4.5 Opus model, highlighted for being roughly 50 % more token‑efficient than its predecessor (Claude 4.1) while maintaining high reasoning performance.
Panelists recommend deploying Claude 4.5 Opus through IBM Cloud Code, noting its cost advantages and strong suitability for coding tasks.
Early hands‑on impressions compare Claude 4.5 Opus favorably against recent rivals such as Google Gemini 3 Pro and OpenAI GPT‑5.1 Pro/Codeex Max, suggesting it now sets a higher benchmark for AI coding assistants.

Sections

Full Transcript

# Claude 4.5 Opus: Efficient AI Model **Source:** [https://www.youtube.com/watch?v=SdNRWJ-oqjY](https://www.youtube.com/watch?v=SdNRWJ-oqjY) **Duration:** 00:41:21 ## Summary - The host frames the AI landscape as an “infinite game,” emphasizing a shift toward a creator‑centric ecosystem that can break the dominance of large Web 2 companies. - “Mixture of Experts” brings together top AI thinkers—including IBM engineers and executives—to discuss broader strategic themes rather than just headline news. - The episode’s focus is Anthropic’s newly released Claude 4.5 Opus model, highlighted for being roughly 50 % more token‑efficient than its predecessor (Claude 4.1) while maintaining high reasoning performance. - Panelists recommend deploying Claude 4.5 Opus through IBM Cloud Code, noting its cost advantages and strong suitability for coding tasks. - Early hands‑on impressions compare Claude 4.5 Opus favorably against recent rivals such as Google Gemini 3 Pro and OpenAI GPT‑5.1 Pro/Codeex Max, suggesting it now sets a higher benchmark for AI coding assistants. ## Sections - [00:00:00](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=0s) **Infinite Game, AI Marketplace Evolution** - In a Thanksgiving episode of Mixture of Experts, host Tim Huang and his panel explore the notion of AI as an endless, resource‑driven “Simon Sync” game that could break web‑2 monopolies, foster a creator‑centric ecosystem, and highlight Anthropic’s newly released Claude 4.5 Opus model. - [00:03:38](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=218s) **Rapid AI Model Releases and Pricing** - The speaker examines the near‑simultaneous launch of several high‑performing AI models, highlighting improved price‑performance, larger context windows, and optimizations in the new 4.5 Opus compared to earlier versions. - [00:07:35](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=455s) **Enterprise Access via Cloud Providers** - The speaker explains that offering the coding‑focused AI model through hyperscalers like Azure and AWS at affordable rates facilitates enterprise deployment, while also highlighting the model’s broader optimizations such as PowerPoint slide creation. - [00:11:04](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=664s) **AI Commerce Impact on Black Friday** - Speakers debate whether emerging AI‑driven shopping tools will meaningfully disrupt holiday retail, concluding that the expected boost in automation and agentic browsing will likely be minimal compared to previous years. - [00:14:15](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=855s) **Agents Power E‑Commerce Returns** - The speaker argues that 2024 will be the “year of agents” because automated, backend agents are already streamlining product returns for major retailers, driving real adoption in commerce. - [00:17:21](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=1041s) **Debating the Year of AI Agents** - Panelists argue whether the surge in AI tools like ChatGPT and Gemini signals a full transition to ubiquitous agent deployment or remains a transitional phase akin to the PC era, emphasizing enterprise integration through web search, tool‑calling, and service connectors. - [00:20:49](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=1249s) **Predicting AI Agent Adoption Timeline** - The speaker compares the historical rollout of LLMs—from research breakthroughs to consumer apps—to the emerging field of AI agents, debating whether agents will reach widespread use faster or slower than the four‑year lag seen with LLMs. - [00:26:16](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=1576s) **Democratizing Agent Platforms** - The speakers liken the future breakthrough in AI agents to Shopify’s democratizing impact, arguing that a simple, low‑friction solution will trigger rapid, widespread adoption, while highlighting the current tension between perfecting language‑to‑agent interfaces and building supporting deployment infrastructure. - [00:30:11](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=1811s) **Deterministic Agent Execution Frameworks** - The speaker argues that merely providing information to LLMs is inadequate without tool integration, stresses the need for deterministic, step‑by‑step execution to avoid skipped tasks, highlights the importance of production‑ready frameworks for deploying agents, and ponders which platforms will emerge as winners in the future agent ecosystem. - [00:35:05](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=2105s) **Agentic AI Market Forecast** - The speaker outlines a split between frontier AI firms pursuing agentic capabilities and cost‑efficiency offerings, predicting that success will belong to those who can deliver repeatable, turnkey agents, likening today’s fragmented agent building to the early days of AI model development. - [00:38:12](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=2292s) **The Infinite Game of AI** - A speaker contends that AI progress is an endless, resource‑constrained contest with no definitive victor, stressing the need for compact, intelligent models and a decentralized creator ecosystem to dismantle Web 2 monopolies. ## Full Transcript

0:01I don't think this is a finite game. I 0:03don't think there is a winner. I think 0:04this is the classic Simon Sync Infinite 0:06game. I think um the players are going 0:09to play until they run out of resources 0:11and they can no longer play the game. 0:13And I think we're going to win, right? I 0:15think it opens up a creator ecosystem. 0:17What I hope that it breaks up is all 0:20these kind of web 2 massive companies 0:23controlling everything and we can get a 0:25a more uh surrounded marketplace. All 0:29that and more on today's Mixture of 0:30Experts. 0:36I'm Tim Huang and welcome to Mixture of 0:38Experts. Each week, Moe brings together 0:40a panel of the smartest and most 0:42charming thinkers in technology to 0:43distill down what's important in 0:45artificial intelligence. Joining us 0:47today are three incredible panelists. 0:48We've got Chris Haye, distinguished 0:49engineer, Lauren McHugh Oende, program 0:52director AI open innovation, and Vulmar 0:54Ulleig VP core AI and Watson X AI. So, 0:57this is our Thanksgiving episode and 0:58we're going to change up the format a 1:00little bit. Rather than ticking through 1:01the news of the moment, we're going to 1:03take a step back and have a focused 1:04discussion about the bigger picture. 1:06But, as always, we've got the headlines 1:07with 1:12Hi everyone, I'm Eile McConnen, a tech 1:14news writer for IBM Think. As always, 1:17I'm here to cover your top AI news of 1:19the week. But instead of running through 1:21a bunch of headlines today, we're 1:22actually going to focus on one, the big 1:24news of the week. Anthropic's new Claude 1:274.5 Opus model which just dropped. And 1:29to do this, I'm joined by our expert, 1:31Mihi Creetti, distinguished engineer for 1:33Aentic AI. 1:35>> When I heard it was about the latest 1:37model from cloud, I couldn't resist. So 1:39happy to be here. 1:40>> What would you say is the most important 1:42thing that users need to know about 1:44Anthropic's new Claude 4.5 opus? 1:46>> I think it's just how efficient it is 1:48with its tokens. It's 50% more efficient 1:51than Cloud Opus 4.1. So even though it's 1:54a reasoning model, even though it can be 1:55a fairly expensive model, it's cheaper, 1:58but it also consumes 50% fewer tokens 2:00when reasoning. So it's it's one of the 2:02most efficient models out there. So 2:04don't be afraid to use it. Give it a 2:06try. I think the best way to use it is 2:08through cloud code and leveraging those 2:11capabilities and it really performs very 2:14well for how token efficient it is. 2:16>> Mihi, what are some of your initial 2:18reactions to the model after playing 2:19with it? Yeah, it's all kind of fresh 2:21because I think it all happened like 21 2:22hours ago. So, this is all fairly new. 2:25Um, but I've already been using it. I've 2:27been using it with both, you know, the 2:30desktop application and with cloud code 2:32and I can say this is by far the best 2:34model for coding and the bar has already 2:36been set quite high. Uh, as you know, 2:38Google has released Gemini 3 Pro quite 2:41recently and I think uh two three days 2:44ago. Um, OpenAI released GPD 5.1 Pro 2:48which has great reasoning capabilities, 2:50but also they've released GPD 5.1 Codeex 2:53Max, which makes their codeex uh I would 2:56say agentic platform really really good. 2:59I suspect at least from my initial 3:01testing that it was on par or even 3:03better than um cloth code with um the 3:06previous 4.1 models or opus 4.1 or 4.5 3:10set but now with Opus 4.5 I believe uh 3:15entropics has regained the lead in terms 3:16of the best model for coding still doing 3:19some initial testing on it but it's 3:22performing really well. That's super 3:23interesting. And you mentioned this 3:25comes, you know, shortly after the 3:26release of Gemini 3 and we've had some 3:29other big models. You know, what do you 3:31make of the the timing of Anthropics 3:32release? Obviously, it's been a busy 3:34fall for big releases of coding agents. 3:37>> This can't be a coincidence. I'm 3:38wondering if these vendors just have 3:40these models ready to go. Um, they just 3:43might not have the best, I would say, 3:45uh, price performance or they're waiting 3:47from another announcement from their 3:49competitors before they put them out to 3:51market. uh because the timing was really 3:53really good. I mean within the span of 3:54three days we got you know three world 3:56leading models or all for code all 3:59outperforming each other in various 4:01benchmarks which is you know quite 4:04interesting. 4:04>> And are there you mentioned you know 4:06three uh strong performing new agents 4:10you know being released in quick 4:11succession. Is there any aspect of um 4:14the cloud 4.5 opus that is different or 4:17that you know obviously it has it sounds 4:19like slightly superior performance but 4:21you know what what makes this release 4:22different if it is from your perspective 4:25>> I think it's the pricing as well they're 4:27able to reach much better price per 4:29performance or number of tokens used 4:31than the previous 4.1 opus um I was at 4:34some point where I was just using um 4.1 4:37opens for the planning or the more 4:39complex tasks and using 4.5 set or the 4:41previous models for the actual work cuz 4:44they have, you know, a very large uh 4:46context window. They were cheaper, they 4:47were faster. I think they've been making 4:50some optimizations in terms of things 4:52like pricing and performance as well for 4:54Oppus 4 lat 5 and it feels you get a bit 4:56more bang for your buck than the 4:58previous versions of um you know for lot 5:00one opus 5:01>> and can you talk a little bit more about 5:03how you know they've achieved that those 5:06uh a lower pricing you know are there 5:08sort of innovations in how they're 5:10approaching things that have helped them 5:12um to do that 5:13>> I'm not sure it's necessarily to do with 5:15innovation maybe it just has to do with 5:16more availability of the hardware. You 5:18know, they've recently announced some 5:20very, very strong partnerships with both 5:23u Microsoft Azure and with uh Google to 5:25use, you know, more GPUs, more TPUs. Uh 5:28I think part of it just has to do with 5:30having more availability of the 5:31infrastructure and being able to reach a 5:34bit further and say, hey, we're putting 5:36some of our best models out there as the 5:37default. I was actually quite surprised 5:39and maybe even shocked that when I've 5:42opened up 5:44uh cloud code, it recommended 4.5 opus 5:47as the default, which is somewhat 5:49unusual. It might say something like, 5:50"Hey, we're going to use for.1 or we're 5:52going to use for 5 for uh thinking and 5:54then we're going to transition into 4.1, 5:56but outright they've launched a new 5:58version of uh cloud code and it used for 6:01five opus as a default." 6:03>> And I guess it's hard to predict, but 6:05how long does Anthropic have the lead? 6:07you know, we've had is it 48 hours? Will 6:10we have something new by um by next week 6:12or you know, does this set itself apart 6:15such that you'll be using it yourself at 6:17least as your top choice for at least 6:19the coming weeks? 6:20>> I think the way I see it and I'm I'm 6:23using all three models. I'm using Gemini 6:26for a lot of my deep research, my 6:28research, my advisory. I'm using codeex 6:32sorry cloud code with um opus for 6:36writing code or for you know writing 6:38test cases and I'm using uh codeex with 6:40GBD for reviews or for anything else uh 6:44after so I'm actually using all three 6:46but I would say by number of tokens 6:49it's still the entropic models they're 6:51I'm using the most tokens with uh 6:53they're writing the vast majority of the 6:55code for me because they perform the 6:57best for this particular use case. uh I 7:00don't see it necessarily as a 7:03situation where if I had a different use 7:05case I would still use the mouse from 7:07from entropic if it was you know u 7:09summarization or content generation or 7:12creative thinking or creative writing u 7:14maybe I would lean more towards GBD5.1 7:17pro um but at least for in the area of 7:20code it could also be a combination with 7:22what they put in uh the cl code tooling 7:25it still seems to outperform 7:28um codeex at least for my use cases or 7:31from my my personal experience. 7:33>> Do you think for that specific area of 7:35coding does this sort of have enterprise 7:37application or usefulness or you know do 7:39you think that's part of the the play in 7:41trying to bring the cost down to make it 7:42sort of um more enterprise friendly or 7:45yeah how do you see that? Yeah, I think 7:47the strategy of making this available 7:49through the various hyperscalers at um I 7:52would say at a reasonable cost is going 7:54to help with enterprise deployments 7:56because many of the enterprises are 7:58never going to consume the models 7:59directly from the provider. So you know 8:01you're not going to go off and consume 8:03this on if you're using models from open 8:06AAI you're going to consume them likely 8:08on Azure from Microsoft not necessarily 8:10directly from chat GPT. Uh same thing 8:13goes with cloud. You're going to consume 8:14it through maybe AWS bedrock, but there 8:16has been somewhat of a limiting choice. 8:18Uh for enterprises with its availability 8:20through Microsoft Azure, for example, 8:23this has really opened it up to to 8:24enterprise customers and uh uh 8:27consumers. 8:28>> Is there anything else about the model 8:30um you know that's worth worth talking 8:32about that we haven't covered yet that 8:34sort of strikes you as interesting? 8:36Yeah, I think um clearly they've 8:37optimized it for agents, they've 8:39optimized it for computer use, they've 8:40optimized it for coding, but I like the 8:42fact they've also optimized it for 8:44things like building PowerPoint slides, 8:45which was maybe at um something that, 8:48you know, even Microsoft was looking at 8:50these models for for use in Office 365 8:53or, you know, PowerPoint generation, 8:54slide generation. So, they're not 8:56looking just at software development use 8:59cases. They're now starting to tackle a 9:01lot of other enterprise use cases. you 9:03know being the best model for generating 9:05a PowerPoint slide or generating a word 9:07document of generating and working with 9:08you know XML or working with the schema 9:11uh required to build those documents. So 9:13I I I'm I'm pleased and happy to see 9:15that models are being optimized for 9:17these enterprise use cases. 9:18>> I'd be happy to have a model uh to do my 9:20PowerPoint slides as well. 9:22>> Yeah, 100% 9:23>> to optimize. Thank you Mihi so much for 9:25joining our conversation. And now we're 9:27going to return to our special 9:29Thanksgiving episode. Happy holidays 9:31everybody. 9:35Thanksgiving, I think, is like a really 9:37good time to be talking about agents 9:39because, of course, agents have been 9:41very much hyped in 2025, but Agentic 9:43Commerce has been one of the things in 9:46agents that people have been excited 9:47about. And you know, this week is going 9:49to feature Thanksgiving, but also 9:50importantly, Black Friday, which is one 9:52of the biggest shopping moments of the 9:54year. And so, I guess maybe Chris, I'll 9:57kick it to you first. you know, do you 9:59think this week is going to be breakout 10:01moment for agents in Agentic Commerce? 10:03You know, why or why not? 10:05>> No, I don't think it will be. I think 10:07we're probably another year away from 10:09that. Um, why not is that I think all of 10:15the ingredients are getting in place. So 10:16if you think about what OpenAI's done, 10:19they're now bringing on board the 10:21ability to shop in their channel for uh 10:24commerce products and they partnered 10:25with uh you know uh you know Shopify 10:29etc. But there's so many commerce 10:32retailers that they've not on boarded 10:33yet. So I think that's really early and 10:35it is US only at the moment. And then 10:37Google has released their uh agent 10:40commerce protocol and again that's 10:43really early at the moment. So I think 10:46we're and agentic browsers haven't quite 10:49taken off yet. So I I I just think we're 10:51about a year away from that. Now what 10:53where where I do think it's going to 10:54become relevant is utilizing web search 10:58and deep researchers from within chat 11:00GPD to find the products that you want. 11:03That is going to be big and that is 11:04disrupting retailers. But I I I don't 11:07see a massive effect on Black Friday 11:09this year. And what's sort of 11:10interesting and I'd love to kind of 11:11parse that out a little bit more is I 11:13guess Chris you've listed like a couple 11:15key components right it's almost like 11:17the agentic browser is not quite there 11:20the uh you know the the like 11:22partnerships are not right there from a 11:24business standpoint I guess Lauren do 11:26you kind of agree with this assessment 11:27do you think like you know this week is 11:29going to be big for Aenta commerce I 11:30guess you know Chris is almost saying 11:32like I guess there is some in so far as 11:34people are using it to find products but 11:36um it's it's not obviously what we were 11:37promised in the in the the the exciting 11:40early days of 2025. 11:42>> Yeah, my feeling is it's also not going 11:44to be so different from last year. So, 11:47the you know, the protocols that Chris 11:48talked about will help in automating the 11:53actual checkout. Once you're using chat 11:55GBT and you find the thing you want, you 11:56can automate the checkout. But I'm not 11:59sure that was really ever the biggest 12:02problem. You know, I didn't have a big 12:03problem putting in my credit card 12:05manually once I get to the link. They 12:07spent a lot of time making it easy to 12:09spend money on the internet. 12:11>> Yeah. 12:11>> I mean, we've had automated checkout for 12:14a while. It was very hacky. You know, 12:16that's why it's hard to get concert 12:18tickets is because it is possible to 12:20build browser automation to buy things 12:23automatically. So, I don't see a big 12:26revolution coming from the simple act of 12:29being able to check it out once you're 12:31in the AI application. And I do think 12:34too that even the product research 12:36capabilities are a bit underwhelming. 12:39You know, when you know you're looking 12:40for something specific like size, 12:42dimensions or a style or something, it's 12:46not it's not always easy to find that 12:49just through the kind of interface that 12:51we have now. And I think a lot more work 12:54could be done on both training AI models 12:57on e-commerce relevant information. So 13:01you know what are the input output pairs 13:03of input this was the customer intention 13:05output this is what they ultimately 13:07selected and you know allow the AI model 13:09to build those pattern recognition of 13:12okay you know that intention really 13:13meant this size of things or this 13:16configuration so I think there's 13:17definitely work that could be done on 13:20that front of having the models 13:21themselves better uh working better for 13:25e-commerce and then I think there's also 13:27you know when you fit those models into 13:29a aent authentic pattern. How do you you 13:33know prompt it? How do you build in the 13:35steps of that process? So you know like 13:38first by start start by identifying the 13:40retailers you want to look in then you 13:42know get the information you need on 13:44their products then compare them like 13:47that's a whole flow that using a general 13:49purpose um chatbot or you know chat GPT 13:53or whatever pick your system is not it's 13:56not made specifically for that. So I 13:58think if we had more workflows that were 14:01built specifically for that, the 14:03performance of those would go could go, 14:05you know, through the roof that you know 14:07when you have an intention, you can get 14:09a specific link to that product right 14:11away. So I think that's really where I'd 14:12like to see more improvements. 14:14>> Yeah, definitely. Vulmar, I'd love to 14:15bring you in because I think your angle 14:17of this is really interesting. When we 14:18talk about agents, obviously we tend to 14:20talk about like higher up in the stack, 14:21right? It's like the application's not 14:23quite there. Even like the business 14:25partnerships are not quite there to get 14:26this to work. Is there a hardware 14:27limiter to the world of agents really 14:29taking off particularly in commerce but 14:31I guess otherwise or is this not like 14:33almost not even in the picture? So I I 14:35would take a completely different stand 14:37right I think it is actually the year of 14:39agents and the reason is very simple um 14:42if you look at black Friday 15 to 20% of 14:46the stuff gets returned right and so I 14:49think the agents are not the consumerf 14:52facing agents but the agents are 14:53actually the back end and I think this 14:55is where the true adoption happens where 14:58stuff you know people are returning it's 15:00the same after Christmas like you know 15:02it's like statistics like it's unclear 15:04somewhere between 15 and 25%. It depends 15:06on the product category. So if you look 15:08at Amazon today already um you know if 15:11you want to return something it used to 15:13be like you know they they click five 15:15buttons and then they're like okay good 15:17ship it back or no we reject it and you 15:19had to make a phone call. Now that's all 15:21done through agentic workloads. And so I 15:24think from from the big retailers and 15:26probably not I mean Shopify at some 15:28point will offer it as well but the big 15:30retailers uh are already in that motion 15:33of actually optimizing that backend 15:35flow. I do not know what they are doing 15:38when the product hits you know their 15:40their shipping center. um if they have 15:42agentic workloads there I'm sure they do 15:45but that first customer touch point uh 15:48effectively doing like the return I 15:50think that's where the the majority of 15:52the labor is on their side because the 15:53the front end is very optimized so um we 15:57just don't see it as a consumer but we 15:59see it indirectly because actually 16:01return is easier 16:03>> yeah definitely and I think this is kind 16:04of a pattern I did want to talk about I 16:06mean just zooming out from e-commerce 16:08right or like you know buying stuff 16:10online is I think it's almost easy as 16:12like a consumer to be like agents are 16:14the dog that didn't bark in 2025 because 16:17yeah most of the people I know are not 16:19using agents every single day for all 16:20sorts of things but it does seem like on 16:22the back end on the enterprise there is 16:24a lot of agent activity and so it's sort 16:26of interesting is that kind of like the 16:28public face and public experience of 16:29agents is like very underwhelming 16:32whereas you know Vulmar there's like 16:33stuff under running under the hood for 16:34like returns which are like very much 16:37identified um and so we have this kind 16:38of split screen that's happening in in 16:41agents that might actually fool us about 16:43just how far this thing is going. 16:44>> And I think that I mean if you look at 16:46programming agents it's pretty 16:47complicated you know what you need to do 16:49and you need to coers it into doing the 16:50right thing and so I think the consumer 16:53will always uh consume agents indirectly 16:56through products and so it's not like 16:58you know I mean we have like these apps 16:59on the phone where I can automate stuff. 17:02I don't know I I have one automation on 17:04my iPhone which is like when I'm driving 17:06close to our community open the gate. 17:08Okay, that's the only automation I have 17:10out of all the automations I could 17:11build. And so humans typically like you 17:15know they want to have a packaged 17:16product which just solves the problem. 17:18And I think this the beauty of what CHPT 17:21did is giving you that that one line 17:23everybody knows how to use from Google 17:25over 20 years. Um and it's it's really 17:28easy to consume and now they're building 17:31similar to Google all these capabilities 17:32in. So I think that's how we will as a 17:35human we will consume agents. Um but 17:37then in the enterprise no you take your 17:39business process and where every every 17:42place you have a human you can actually 17:43try to put an agent and so I think we 17:45will see adoption indirectly but not 17:48directly now is it the year of the 17:50agents I think we are still in the PC 17:53phase and some companies and so in that 17:55sense I just want to have a contrarian 17:57opinion I think to a certain extent um 18:00the it is probably the year of the agent 18:03PC's let's call it this way 18:05>> okay yeah pilot agent Yeah, the pilot 18:07agents. Yes. 18:08>> Yeah. Chris, go. 18:10>> Yeah. No, it's it's the year of the 18:12agents. I don't you know, I disagree 18:14with Mar. It's the year of the agents. 18:16So, 18:17>> you don't need to caveat that. Uh you 18:19say, yeah, 18:20>> it just is, right? I mean, we we need to 18:23think about this for a second, right? 18:24So, if we 18:26>> look at what has happened with chat GPT, 18:28right? And we'll we'll start from there 18:30and then move outwards. But um 18:32integrated both in it claw Gemini you've 18:35got web search capabilities which is 18:37which is tool calling you've now got the 18:39model catalog so you can hook up things 18:41like your Jira your you know and pretty 18:44much anybody who's got a service uh on 18:46the internet you can hook up as a 18:48connector now and that is basically tool 18:50calling everybody's offering deep 18:51researcher which is a agent behaviors um 18:55and then probably the biggest star of 18:56them all is going to be the coding 18:58agents right that's just went crazy 18:59especially things like claude code if 19:01you think about things like lovable etc. 19:03um everybody uh you know codecs 19:06everybody's using codec uh coding agents 19:09uh to get their work done and the 19:10biggest thing that's made the difference 19:11there is given access to tools so I I 19:15think agents are here and as I said at 19:17the beginning of the year year to super 19:19agent is that um the fact is that with 19:23planning and reasoning these agents have 19:26became really really capable so I I 19:29think uh it can still feel PC like 19:32because everything's maybe not agent 19:35agent agent in the way that you think, 19:36but we are all pretty much using agents 19:39every day. We're just not thinking about 19:40them in that way. 19:42>> Yeah. And I guess they're not weaved 19:43together in kind of like the cohesive 19:45experience that we've been promised, 19:47right? Like Chris, you began by saying 19:48like, well, it's not like Black Friday 19:51like agent commerce is really going to 19:53be happening everywhere because all 19:54these pieces are still missing. They're 19:55they're there, but they just haven't 19:57been kind of orchestrated in a certain 19:59sense. Um, Lauren, I I always joke, you 20:02know, the agentic consumer demo is 20:04always like, you need to book a trip and 20:06he's like, push the button and the trip 20:07is booked. Um, and I guess kind of what 20:10Vulkmar is saying is like that that 20:12isn't that isn't happening. Might take a 20:14really long time to happen. Do you think 20:16it eventually will like will we get to 20:18the much more kind of like consumerry 20:21agentic experience, right? That like I 20:23think that is the source of all these 20:25like splashy videos and startups that 20:27people are working on. Um, or is that or 20:29is that kind of I mean Vulmore what I 20:30heard you saying is almost a little bit 20:31like the future may not actually look a 20:34whole lot like that just because of all 20:35the things you need to package and so 20:37like all the agents might always be kind 20:39of a little bit in the background but 20:41I'm curious about kind of like how far 20:43to the consumer you think the age agent 20:45experience will look like. 20:46>> Yeah, I think the trajectory of just 20:49LLMs standalone is a really interesting 20:52one to compare this to. So LLMs we had, 20:56you know, 2017 the Transformers paper. 20:58Um 2018 was the year that we got BERT 21:03and was the year that we got GPT1 21:06and then 2022 was when we got those 21:09things available to the end consumer in 21:12a very very easy way like in the form of 21:14a web app or a mobile app. So I feel 21:17like where we are with agents is maybe 21:18that 2018 like you know not purely 21:22research paper level but still not in 21:24the hand not 2022 in the hands of every 21:27single person. You know we have like our 21:30GPT1 BERT kind of demos and things to 21:33look at and then I think the big 21:35question is will it take four years to 21:38get into the hands of everyone like it 21:40did with LLMs? You know, there's 21:42definitely reason to think that across 21:44the board, these timelines are 21:46accelerating. So maybe it could be less 21:48than four years. We have way more 21:50attention and investment in this 21:53technology than we did with, you know, 21:55there was just low awareness amongst 21:56the, you know, community of investors 21:59and people who are going to nudge this 22:01along back in 2018 of LLMs. Um, so could 22:05it be faster because of that or could it 22:06be longer because maybe it's a lot going 22:09to turn out to be a lot more complicated 22:11than getting LLMs into production and 22:14into everyone's hands. So 22:15>> yeah, it's like I I do like the idea 22:17this like background hype on AI makes 22:19all downstream AI things happen faster 22:22because everybody's paying attention to 22:23it now. Um, and I guess Lauren, what I 22:26think this is actually one thing I did 22:27want to ask you a little bit about is 22:29that like a big part of this 22:30acceleration is whether or not it's easy 22:32for people to develop agentic platforms, 22:35tools, applications. And do you want to 22:38give just kind of a flavor of the state 22:39of the developer ecosystem right now? 22:40Because I feel like that's a critical 22:41thing. I mean, I think Vulmar said a 22:43moment ago, right, like getting these to 22:45work still takes a lot of work, right? 22:48And so I feel like that in some ways 22:49limits our progress just because like 22:50the number of organizations and people 22:52who can actually do this is like small. 22:54And so one way you increase progress is 22:56you just make it easier for people to 22:58develop for it. And so interested in how 22:59you see the developer ecosystem around 23:01this uh evolving. 23:02>> I think it's a really really fun time to 23:05be a developer if you want to try and 23:07you want to experiment and you can I 23:09mean you can do that no code. There's 23:10things like lang flow that you I mean 23:12it's visual to build an agent and drag 23:15and drop. That's super cool. that helps 23:17you not waste a lot of time coding 23:20something that ultimately you know the 23:21data is just not there or the LLM just 23:23doesn't understand all the way to like 23:25the pro code there's lane chain lane 23:27graph crew AI autogen semantic kernel 23:30there's I mean your choice of things and 23:32some are a bit more easier and 23:34abstracted to use some give you full 23:37control if you want it so I think if you 23:39want to try you have all of the tools to 23:42do that that should never be the problem 23:44I think if you want to actually deploy 23:46deploy that and take it out of, you 23:48know, a very tightly controlled 23:50environment with a very, you know, 23:54precisely specified use case, which is 23:56probably book a trip. Like you said, if 23:59you want to ever, you know, expand 24:01beyond that, actually have it hosted 24:02somewhere, somewhere where you could 24:04invite your friends to try it, that's 24:06where it immediately gets very 24:08complicated and there's far fewer just 24:12obvious options of what you're going to 24:14use. You know, you might have to I mean, 24:17right now it would probably make sense 24:19to you want to deploy an agent and have 24:22it hosted somewhere. You have to figure 24:23out where to host the agent logic 24:25itself, which is not really LLM type 24:28workloads and then a separate 24:30environment to host the actual inference 24:33and then patch those two things 24:34together. So, it's really not ideal. So, 24:37I think you know actually scaling up, 24:39sharing, hosting what you build is the 24:42hard part. So I think it's also one of 24:44the inhibitors right so if you look at 24:46it right now we don't have this packaged 24:48all happy solution and it's the the 24:50entry barrier we are not at the Shopify 24:53level right where the you know a mom and 24:56pop shop can say hey I want to have an 24:58agent and it should you know deal with 25:00something um you know and I think the 25:02the there are some projects we have in 25:05IBM where we take the the um like the 25:09flows and the business um like 25:12description and we're converting that 25:14straight up from English into like you 25:17know an a lang flow and so that 25:20transformation when we are at the point 25:22that you can actually use English to 25:25describe what problem you want to 25:26automate um and not know anything about 25:29programming I think then you get uh you 25:31get it to the masses and then you can do 25:33it on a cell phone right it's like hey 25:35when I come home I want the lights to be 25:36on and not like you know build 25:39automation you know when someone's like 25:41it's it's the the the interface is just 25:43like really it's it's um it's a baby 25:46programmer interface for people who can 25:48program and that's why nobody uses it 25:50right but I mean there there's a logic 25:52and and I can describe that logic in 25:54English so now you need to be very 25:55explicit but I think the models can 25:58already fill in the gaps they are smart 26:00enough for that if you can get to a 26:02point similar to you know right now we 26:03are doing English to code if we can get 26:06English to agent then we are at a point 26:08that it's mass consumable and right now 26:11the interface is are still it's still 26:12built for programmers. It's not built 26:14for consumers. 26:15>> Yeah, that's right. And I feel like that 26:16vision almost shortcircuits it which is 26:18like well do you need a developer 26:20ecosystem, right, for a whole set of 26:23applications? Um which I think is pretty 26:25pretty interesting. 26:26>> I think it's pretty obvious that you 26:27know someone like I always use Shopify 26:29but Shopify was this you know if you 26:32look in the 2000s it was like oh my god 26:34you can you can run a web server on the 26:36internet that's amazing right? So I can 26:38build a billion dollar business and then 26:39Shopify came along and just in fact we 26:41democratized this. We are not yet at the 26:44point it's still high-tech. It's not 26:46democratized. But it's just a question 26:48of time that someone wraps it and says 26:50okay you know I make it really easy and 26:52and that easiness once you have that and 26:54you the complexity goes down by a factor 26:56of 10 or 100 then it will be then you 26:58know everybody will use it because 27:00otherwise you die and I think there will 27:02be an an integration in already these 27:04type of commerce applications. Um, and 27:07so the moment someone figures this out, 27:09well, it will just be wildfire. But I 27:11think that pivotal moment hasn't 27:12happened. The Shopify moment for agents 27:15hasn't happened. 27:15>> Yeah. There's almost a tension between 27:17these kind of two pathways. It feels 27:18like where one of them is Vulmar where 27:20you're talking about which is like 27:21language to agent. If we got that really 27:23good and really powerful, then you 27:25almost don't need to build a lot of the 27:27kind of like deployment infrastructure, 27:29I guess, in some ways that like Lauren 27:31you're talking about, right? which is oh 27:32we've got this like prototype we're 27:34building and then now it's got to be on 27:35some kind of rails for us to like make 27:36it more available. There's kind of a 27:39vision for I guess Vulma where you're 27:40talking about which is like the consumer 27:42just simply types in what they want and 27:44then it it happens basically. Um, I I 27:47guess Chris, maybe to bring you into the 27:49conversation, I think going to kind of 27:50what Lauren is saying about like, okay, 27:52right now there's lots of ways of kind 27:53of like prototyping an agent, but the 27:56minute you want to do anything more 27:57complicated or to scale it, there's just 27:59this like gap in the space. Um, do you 28:02have a sense of kind of like what's 28:03necessary to sort of mature that right 28:05now? Is like we're I guess we're still 28:07waiting on the companies and platforms 28:08that are going to make that happen. 28:09>> Yeah, I I think so. I mean I I think it 28:13is to to take things to your point from 28:16from P and MVP to scale is a hard 28:20problem because you know consumers do 28:22crazy things right so you start to have 28:25to say well am I am I putting the LLM 28:27right in front of the consumer and if 28:30you are at that point then you need to 28:32guard rail it and that could be things 28:33like guard models it could be running 28:36you know deterministic flows in 28:38conjunction with the AI to keep it on 28:40track to Vogmar's point about text to 28:43plans if you look at something like 28:45claude code if you look at something 28:47like cursor wind surf etc almost all of 28:50these things have a built-in planner and 28:52and so when you ask a question the first 28:55thing that happens is it goes to the 28:57planning module for anything complex and 28:59then the model is kept to the plan and 29:02you see that you know we talked about 29:03manis early in the year same sort of 29:05thing right you ask a task it goes the 29:07planning module the planning agent uh 29:10kicks in creates the plan and then the 29:12agents execute to the plan. And and 29:14there's a good reason that exists, which 29:17is if you give an LLM and agent uh a big 29:21list of tools, who knows what tool it's 29:24going to pick, right? And and and and my 29:27favorite one at the moment for this is 29:28the Kimmy K2 model, right? I love the 29:30Kimmy K2 model. It can call 200, 300 29:33tools. It has a long sequential range of 29:36tool call and it can do a massive amount 29:37of tools. But you know what it is? You 29:39give it a tool, it's gonna call it, 29:41baby. You know what it's like? It's like 29:43every tool that it's got, it's like, I 29:45will do it this way. I will do it that 29:47way. It goes off the rails. It's in a 29:49phenomenal model, but it goes off the 29:51rails because it can't keep itself on 29:53track. And then even when you're 29:55executing to the plan, quite often the 29:57models will either use its own memory or 30:00it will not even bother updating the 30:03progress. So, it will be like, "Oh, no, 30:04no, no. I know the answer to this." And 30:06then just answer it, right? as opposed 30:07to no I I need you to use the tool. 30:10Right. Your your information that that 30:11that you've got isn't enough. I need you 30:13to use a tool. And it's like no no no I 30:15know I know this. I know this. And then 30:17gives it Yeah. Exactly. And it's like 30:18you don't got it. I do. I do. I do. And 30:20then and then and even if it does that 30:23once it's done the task when you're 30:24following a plan you want to go executed 30:27step executed step executed. And again 30:29the model if if you're not deterministic 30:32and if you leave the model on its own it 30:34will skip steps in the plan or not even 30:36update it. Right? So actually to that 30:38point about frameworks when you want to 30:40start to get to production that's where 30:42those sort of frameworks become really 30:44important in place but but the reality 30:47is that there's not a lot of you you you 30:50then back to a developer mindset to be 30:52able to put those frameworks in place to 30:54deploy. They're not out of the box. I 30:56think I think when we see that being a 30:58mass thing, those frameworks are either 31:01just going to be part of the platform 31:02and ecosystem you deploy your code onto 31:04it or that's going to be solved at the 31:06model level. 31:07>> So I guess in the last few minutes I 31:08want to talk a little bit about sort of 31:10we've been talking about technically 31:11what needs to happen for 2026 I guess to 31:14be the the the real year of the agent. 31:16Um I'm interested a little bit in sort 31:19of like uh winners and losers and kind 31:22of platforms here, right? Um, you know, 31:24I I guess the question is like are the 31:26winners in agent land from a platform 31:28standpoint going to be the winners in AI 31:31in general, right? Like is it going to 31:32be, you know, open AI and Enthropic that 31:35end up dominating the kind of like 31:37agentic ecosystem? Will it be, you know, 31:40some of the maybe the cloud players that 31:41really end up doing this? Uh, I don't 31:43know if anyone here has any strong 31:44priors on like who's well positioned to 31:46kind of really be the the major platform 31:49for this space. 31:50>> I think there are two questions to 31:52answer. So one is what model or what 31:55model zoo do I need to use um to 31:58actually get good results and what Chris 32:01just said was you know that these things 32:03go off the rails and so you need to kind 32:05of babysit them into giving you the the 32:06right answer. I had a case where I you 32:09know I tried to program something and it 32:11had an API call and the API call didn't 32:13work and so in the end the model just 32:15decided like oh I I just stub it and you 32:19know I just call my own function and 32:20it's like look I'm done it works right 32:22and didn't do anything anymore. So it's 32:24like the solution is don't do anything 32:26and then I'm good and it's like success 32:28congratulations. So I think there's a 32:31the whole like how do we manage the 32:34model and uh that's um that's a hard 32:37problem in itself right um building a 32:40model like right now we are the state of 32:42the world is you need these frontier 32:44models because otherwise the reasoning 32:45capabilities are not not you know not 32:48prevalent enough now I think that 32:50probably next year we will see people 32:52just building like planning models like 32:55you just focus on one thing get the 32:57planning right and then of course the 32:59the models underneath to execute the 33:01plan and not go off the track and right 33:03now I don't think we have done that. Uh 33:05and so I think the the the frontier 33:07models are really the only place where 33:09it can go right now but of course with 33:10humongous cost associated to it. So so 33:13we will see um like the smaller models 33:15being specialized for planning. Um the 33:19uh I think the second question is how 33:22you execute this and where you execute 33:24this. And I think that's a that's a 33:26really good question. My belief is and 33:28this where I'm also taking uh like the 33:31our product is AI is everywhere. There 33:34is no place where AI is not right. So 33:37the idea that we are like oh we're just 33:38putting a bunch of you know H100s or 33:40H200s in the data center and that's 33:42where all the AI will happen. That's 33:44just not true. Like we will see 33:46pervasive application. It will happen on 33:48your cell phone. It will happen in the 33:50data center. And so um the real trick 33:53here is who can make those agents 33:55cost-effective because in the end right 33:57now the work is done right in in 33:58Portugal in in business scenarios um the 34:01work is done by labor right there is a 34:03person who is currently doing it by hand 34:06and what we are hoping for is that the 34:07agents a replace the people who do it by 34:09hand so that the people who are doing by 34:11things by hand can do better things. Um 34:14and then the other one is uh we want to 34:16have agents in the hands where right now 34:18the work doesn't get done at all. right 34:20or poorly. So I we want to get more 34:22choices. And so that one is really now a 34:25cost optimization problem. And so I 34:27think there's an industry at the bottom 34:29of like we need to efficiently run that 34:32capacity that infrastructure so that we 34:34bring down the cost of these agents you 34:36know by 10 or 100x and if we hit that 34:39then it will be pervasive. Right now we 34:41are using it primarily for like high 34:43value tasks which are incredibly labor 34:45intensive right or which are very very 34:48controlled. So I can actually say I have 34:50you know thousands of people doing this 34:52but I can put an agent behind it um 34:54because it's a confined enough problem 34:56space that I can supervise and watch it 34:58and so the moment these things get more 35:00powerful and we bring the cost down I 35:02think then it will be a more pervasive 35:04application. 35:04>> Yeah that's right. I like seeing that 35:05kind of like almost like the market sort 35:07of dividing between sort of maybe like 35:08the existing frontier AI model companies 35:11going more agentic and like that's kind 35:13of one part of the market and then 35:14there's a whole kind of like cost 35:16efficiency universe that kind of emerges 35:18and they they also like you know the 35:20frontier model companies might also get 35:22into that but it's also like maybe looks 35:23like a very different kind of market and 35:25a very different kind of ecosystem. Um, 35:28Lauren, I'm curious about how you divide 35:29up the future agentic market. Is it, you 35:32know, one model model's to rule them 35:34all? You know, is it uh I just kind of 35:36curious there's many ways this could 35:37play out and I'm sort of interested in 35:38how you how you forecast here. Yeah, I 35:40think whoever can make something 35:42repeatable will win because it really 35:45feels like this moment of agents right 35:47now is like traditional AI 10 years ago 35:50where it was really cool that you could 35:53build an AI model to do anything, but 35:56you had to do that from scratch. So you 35:58wanted it to predict education outcomes, 36:00you had to find the data, train the 36:02model just for that, refine it, and then 36:05package it up and use it. And then if 36:07you wanted it to do something else, you 36:08had to start over. It was a whole 36:09endto-end process every time. And that's 36:12kind of what agent building is right 36:14now. And it's even more painful because 36:16it's not just code based, it's language 36:19based. So so much of that rebuilding is 36:21like prompting it and figuring out how 36:23to nudge it in certain directions and 36:26get it to use tools sometimes and not 36:27other times and use the tools in better 36:29ways. So I feel like if there 36:33I mean the breakthrough with traditional 36:34AI was foundation models. We then 36:37trained bigger better models because we 36:40could we had more data we had more 36:41compute and then that one model could do 36:45different things because it knew a 36:47little bit of everything. I think if we 36:49similarly had some concept of foundation 36:52agents it could work similar um and then 36:55you know kind of reduce that friction of 36:57having to build from scratch every 36:59single time. 37:00>> Yeah for sure. And you think the winners 37:01that are best positioned I guess there 37:03would be the existing leaders right like 37:05I guess they take their model polish it 37:07off and then it's the foundation at 37:09Gentic you know model basically 37:11>> and I don't even think it would be model 37:12at this point it would be orchestration 37:15of multiple models plus other um you 37:18know constraints put on that 37:21>> um I really don't know I don't know if 37:23it would be the existing leaders or is 37:26it going to be some dark horse that 37:28builds one agent initially builds one 37:30agent to do one specific thing but then 37:33takes the pieces of that you know 37:36whatever percent of the code and uses 37:38that to do a second thing and then a 37:40third thing and then eventually because 37:42that was kind of the AWS story right is 37:43they were building for themselves to do 37:45something specific initially but then a 37:48lot of that cloud infrastructure could 37:49be used for other things beyond that so 37:51I think there is a scenario where that 37:54happens where someone just commits to a 37:56use case and initially you know they're 37:58kind of looked down upon because they're 38:00using AI to just do one thing but do it 38:03well, but then they realize what the 38:05pattern is to expand to other things and 38:07eventually build something that's more 38:09repeatable and more of a platform. 38:10>> Yeah, that's very rich. I never really 38:12thought about that as basically like 38:13what's the specific agentic problem that 38:16if you solve unlocks the largest number 38:18of subsequent agentic problems and it's 38:20kind of interesting to think about like 38:22is that is that the travel planning one? 38:24Like what what what is that use case? 38:26So, um Chris, do you want to give us a 38:28final thought here before we close up? I 38:30don't think this is a finite game. I 38:32don't think there is a winner. I think 38:33this is the classic Simon Sync infinite 38:35game. I think um the players are going 38:37to play until they run out of resources 38:40and they can no longer play the game. 38:41And I think that's that's that's what's 38:44going to happen, right? So I I think a 38:46lot of the technology and the techniques 38:48is known. I think it's well known across 38:51the world and the limiting factor is 38:52resource. But in an agentic world, the 38:55the models need to get smaller and 38:57smarter and they need and in the future 38:58be able to fit on a chip, right? And and 39:00therefore I I just don't think there is 39:03a winner in this scenario. So who do I 39:06think is going to win? I think we're 39:07going to win, right? I think it opens up 39:10a creator ecosystem. What I hope that it 39:12breaks up is all these kind of web 2 39:15massive companies controlling everything 39:18and we can get a a more uh surrounded 39:21marketplace. That's what I believe in. 39:22And and and the biggest thing if I think 39:25about this is what what I think was 39:28going to happen for 26 and 27 is you 39:30remember the Rick Rubin episode where I 39:32was frantically uh googling who Rick 39:35Rubin was so and I could give an 39:36intelligent answer to your question, 39:38Tim. Um I'm obsessed with Rick Rubin at 39:41the moment because actually I think 39:45composition is where we're going. I 39:47think 26 and 27 is going to be about 39:49marketplace, but I think it's going to 39:51be about being being producers. And 39:54you're going to say, "Okay, I've got 39:55this model over here and I've got my 39:56piece of data and I've got my brand and 39:58my style and I've got these five tools 40:00and then I'm going to combine them 40:01together into my ecosystem and I'm going 40:03to create something new and beautiful 40:05and therefore and that's going to be my 40:07product." And and so I I hope that's 40:09what happens is that it's not this this 40:13one model or whatever is is the winner. 40:15That's a depressing future. What I'm 40:17hoping is this vibrant, amazing 40:19ecosystem and marketplace where 40:21everybody's got a chance to use AI to 40:23improve their lives, personalize it to 40:25them, and create their own company's 40:27products and data without the 40:28limitations that we have today. So, 40:30we're going to be the winners. And but 40:32but I I think these model providers, 40:33they're going to come and go, right? And 40:36we saw that this year. Who you know who 40:38who was Moonshot and Kimmy, right? You 40:40know, ask that question six months ago, 40:43right? And then and then if we go back 40:45last year, who was DeepSeek, right? Same 40:47sort of thing. And and new new model 40:50providers are going to come in. And you 40:51remember when we were super excited 40:53about Manis again? I expect them to come 40:56back at some point. Just people are 40:57going to come in and out and it's and 40:59it's fine and it's okay. But uh yeah, 41:01who knows for the future, but it's going 41:02to be fun. 41:03>> Nice. Well, on that hopeful note, I'm 41:06going to let you all get to your 41:07impending holidays. Uh Vulmar, Lauren, 41:09Chris, thanks for joining us. And thanks 41:11to you listeners for joining us. Uh, if 41:12you enjoyed what you heard, you can get 41:13us on Apple Podcast, Spotify, and 41:15podcast platforms everywhere. And we'll 41:17see you next week on Mixture of Experts.