Claude 4.5 Opus: Efficient AI Model
Key Points
- The host frames the AI landscape as an “infinite game,” emphasizing a shift toward a creator‑centric ecosystem that can break the dominance of large Web 2 companies.
- “Mixture of Experts” brings together top AI thinkers—including IBM engineers and executives—to discuss broader strategic themes rather than just headline news.
- The episode’s focus is Anthropic’s newly released Claude 4.5 Opus model, highlighted for being roughly 50 % more token‑efficient than its predecessor (Claude 4.1) while maintaining high reasoning performance.
- Panelists recommend deploying Claude 4.5 Opus through IBM Cloud Code, noting its cost advantages and strong suitability for coding tasks.
- Early hands‑on impressions compare Claude 4.5 Opus favorably against recent rivals such as Google Gemini 3 Pro and OpenAI GPT‑5.1 Pro/Codeex Max, suggesting it now sets a higher benchmark for AI coding assistants.
Sections
- Infinite Game, AI Marketplace Evolution - In a Thanksgiving episode of Mixture of Experts, host Tim Huang and his panel explore the notion of AI as an endless, resource‑driven “Simon Sync” game that could break web‑2 monopolies, foster a creator‑centric ecosystem, and highlight Anthropic’s newly released Claude 4.5 Opus model.
- Rapid AI Model Releases and Pricing - The speaker examines the near‑simultaneous launch of several high‑performing AI models, highlighting improved price‑performance, larger context windows, and optimizations in the new 4.5 Opus compared to earlier versions.
- Enterprise Access via Cloud Providers - The speaker explains that offering the coding‑focused AI model through hyperscalers like Azure and AWS at affordable rates facilitates enterprise deployment, while also highlighting the model’s broader optimizations such as PowerPoint slide creation.
- AI Commerce Impact on Black Friday - Speakers debate whether emerging AI‑driven shopping tools will meaningfully disrupt holiday retail, concluding that the expected boost in automation and agentic browsing will likely be minimal compared to previous years.
- Agents Power E‑Commerce Returns - The speaker argues that 2024 will be the “year of agents” because automated, backend agents are already streamlining product returns for major retailers, driving real adoption in commerce.
- Debating the Year of AI Agents - Panelists argue whether the surge in AI tools like ChatGPT and Gemini signals a full transition to ubiquitous agent deployment or remains a transitional phase akin to the PC era, emphasizing enterprise integration through web search, tool‑calling, and service connectors.
- Predicting AI Agent Adoption Timeline - The speaker compares the historical rollout of LLMs—from research breakthroughs to consumer apps—to the emerging field of AI agents, debating whether agents will reach widespread use faster or slower than the four‑year lag seen with LLMs.
- Democratizing Agent Platforms - The speakers liken the future breakthrough in AI agents to Shopify’s democratizing impact, arguing that a simple, low‑friction solution will trigger rapid, widespread adoption, while highlighting the current tension between perfecting language‑to‑agent interfaces and building supporting deployment infrastructure.
- Deterministic Agent Execution Frameworks - The speaker argues that merely providing information to LLMs is inadequate without tool integration, stresses the need for deterministic, step‑by‑step execution to avoid skipped tasks, highlights the importance of production‑ready frameworks for deploying agents, and ponders which platforms will emerge as winners in the future agent ecosystem.
- Agentic AI Market Forecast - The speaker outlines a split between frontier AI firms pursuing agentic capabilities and cost‑efficiency offerings, predicting that success will belong to those who can deliver repeatable, turnkey agents, likening today’s fragmented agent building to the early days of AI model development.
- The Infinite Game of AI - A speaker contends that AI progress is an endless, resource‑constrained contest with no definitive victor, stressing the need for compact, intelligent models and a decentralized creator ecosystem to dismantle Web 2 monopolies.
Full Transcript
# Claude 4.5 Opus: Efficient AI Model **Source:** [https://www.youtube.com/watch?v=SdNRWJ-oqjY](https://www.youtube.com/watch?v=SdNRWJ-oqjY) **Duration:** 00:41:21 ## Summary - The host frames the AI landscape as an “infinite game,” emphasizing a shift toward a creator‑centric ecosystem that can break the dominance of large Web 2 companies. - “Mixture of Experts” brings together top AI thinkers—including IBM engineers and executives—to discuss broader strategic themes rather than just headline news. - The episode’s focus is Anthropic’s newly released Claude 4.5 Opus model, highlighted for being roughly 50 % more token‑efficient than its predecessor (Claude 4.1) while maintaining high reasoning performance. - Panelists recommend deploying Claude 4.5 Opus through IBM Cloud Code, noting its cost advantages and strong suitability for coding tasks. - Early hands‑on impressions compare Claude 4.5 Opus favorably against recent rivals such as Google Gemini 3 Pro and OpenAI GPT‑5.1 Pro/Codeex Max, suggesting it now sets a higher benchmark for AI coding assistants. ## Sections - [00:00:00](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=0s) **Infinite Game, AI Marketplace Evolution** - In a Thanksgiving episode of Mixture of Experts, host Tim Huang and his panel explore the notion of AI as an endless, resource‑driven “Simon Sync” game that could break web‑2 monopolies, foster a creator‑centric ecosystem, and highlight Anthropic’s newly released Claude 4.5 Opus model. - [00:03:38](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=218s) **Rapid AI Model Releases and Pricing** - The speaker examines the near‑simultaneous launch of several high‑performing AI models, highlighting improved price‑performance, larger context windows, and optimizations in the new 4.5 Opus compared to earlier versions. - [00:07:35](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=455s) **Enterprise Access via Cloud Providers** - The speaker explains that offering the coding‑focused AI model through hyperscalers like Azure and AWS at affordable rates facilitates enterprise deployment, while also highlighting the model’s broader optimizations such as PowerPoint slide creation. - [00:11:04](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=664s) **AI Commerce Impact on Black Friday** - Speakers debate whether emerging AI‑driven shopping tools will meaningfully disrupt holiday retail, concluding that the expected boost in automation and agentic browsing will likely be minimal compared to previous years. - [00:14:15](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=855s) **Agents Power E‑Commerce Returns** - The speaker argues that 2024 will be the “year of agents” because automated, backend agents are already streamlining product returns for major retailers, driving real adoption in commerce. - [00:17:21](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=1041s) **Debating the Year of AI Agents** - Panelists argue whether the surge in AI tools like ChatGPT and Gemini signals a full transition to ubiquitous agent deployment or remains a transitional phase akin to the PC era, emphasizing enterprise integration through web search, tool‑calling, and service connectors. - [00:20:49](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=1249s) **Predicting AI Agent Adoption Timeline** - The speaker compares the historical rollout of LLMs—from research breakthroughs to consumer apps—to the emerging field of AI agents, debating whether agents will reach widespread use faster or slower than the four‑year lag seen with LLMs. - [00:26:16](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=1576s) **Democratizing Agent Platforms** - The speakers liken the future breakthrough in AI agents to Shopify’s democratizing impact, arguing that a simple, low‑friction solution will trigger rapid, widespread adoption, while highlighting the current tension between perfecting language‑to‑agent interfaces and building supporting deployment infrastructure. - [00:30:11](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=1811s) **Deterministic Agent Execution Frameworks** - The speaker argues that merely providing information to LLMs is inadequate without tool integration, stresses the need for deterministic, step‑by‑step execution to avoid skipped tasks, highlights the importance of production‑ready frameworks for deploying agents, and ponders which platforms will emerge as winners in the future agent ecosystem. - [00:35:05](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=2105s) **Agentic AI Market Forecast** - The speaker outlines a split between frontier AI firms pursuing agentic capabilities and cost‑efficiency offerings, predicting that success will belong to those who can deliver repeatable, turnkey agents, likening today’s fragmented agent building to the early days of AI model development. - [00:38:12](https://www.youtube.com/watch?v=SdNRWJ-oqjY&t=2292s) **The Infinite Game of AI** - A speaker contends that AI progress is an endless, resource‑constrained contest with no definitive victor, stressing the need for compact, intelligent models and a decentralized creator ecosystem to dismantle Web 2 monopolies. ## Full Transcript
I don't think this is a finite game. I
don't think there is a winner. I think
this is the classic Simon Sync Infinite
game. I think um the players are going
to play until they run out of resources
and they can no longer play the game.
And I think we're going to win, right? I
think it opens up a creator ecosystem.
What I hope that it breaks up is all
these kind of web 2 massive companies
controlling everything and we can get a
a more uh surrounded marketplace. All
that and more on today's Mixture of
Experts.
I'm Tim Huang and welcome to Mixture of
Experts. Each week, Moe brings together
a panel of the smartest and most
charming thinkers in technology to
distill down what's important in
artificial intelligence. Joining us
today are three incredible panelists.
We've got Chris Haye, distinguished
engineer, Lauren McHugh Oende, program
director AI open innovation, and Vulmar
Ulleig VP core AI and Watson X AI. So,
this is our Thanksgiving episode and
we're going to change up the format a
little bit. Rather than ticking through
the news of the moment, we're going to
take a step back and have a focused
discussion about the bigger picture.
But, as always, we've got the headlines
with
Hi everyone, I'm Eile McConnen, a tech
news writer for IBM Think. As always,
I'm here to cover your top AI news of
the week. But instead of running through
a bunch of headlines today, we're
actually going to focus on one, the big
news of the week. Anthropic's new Claude
4.5 Opus model which just dropped. And
to do this, I'm joined by our expert,
Mihi Creetti, distinguished engineer for
Aentic AI.
>> When I heard it was about the latest
model from cloud, I couldn't resist. So
happy to be here.
>> What would you say is the most important
thing that users need to know about
Anthropic's new Claude 4.5 opus?
>> I think it's just how efficient it is
with its tokens. It's 50% more efficient
than Cloud Opus 4.1. So even though it's
a reasoning model, even though it can be
a fairly expensive model, it's cheaper,
but it also consumes 50% fewer tokens
when reasoning. So it's it's one of the
most efficient models out there. So
don't be afraid to use it. Give it a
try. I think the best way to use it is
through cloud code and leveraging those
capabilities and it really performs very
well for how token efficient it is.
>> Mihi, what are some of your initial
reactions to the model after playing
with it? Yeah, it's all kind of fresh
because I think it all happened like 21
hours ago. So, this is all fairly new.
Um, but I've already been using it. I've
been using it with both, you know, the
desktop application and with cloud code
and I can say this is by far the best
model for coding and the bar has already
been set quite high. Uh, as you know,
Google has released Gemini 3 Pro quite
recently and I think uh two three days
ago. Um, OpenAI released GPD 5.1 Pro
which has great reasoning capabilities,
but also they've released GPD 5.1 Codeex
Max, which makes their codeex uh I would
say agentic platform really really good.
I suspect at least from my initial
testing that it was on par or even
better than um cloth code with um the
previous 4.1 models or opus 4.1 or 4.5
set but now with Opus 4.5 I believe uh
entropics has regained the lead in terms
of the best model for coding still doing
some initial testing on it but it's
performing really well. That's super
interesting. And you mentioned this
comes, you know, shortly after the
release of Gemini 3 and we've had some
other big models. You know, what do you
make of the the timing of Anthropics
release? Obviously, it's been a busy
fall for big releases of coding agents.
>> This can't be a coincidence. I'm
wondering if these vendors just have
these models ready to go. Um, they just
might not have the best, I would say,
uh, price performance or they're waiting
from another announcement from their
competitors before they put them out to
market. uh because the timing was really
really good. I mean within the span of
three days we got you know three world
leading models or all for code all
outperforming each other in various
benchmarks which is you know quite
interesting.
>> And are there you mentioned you know
three uh strong performing new agents
you know being released in quick
succession. Is there any aspect of um
the cloud 4.5 opus that is different or
that you know obviously it has it sounds
like slightly superior performance but
you know what what makes this release
different if it is from your perspective
>> I think it's the pricing as well they're
able to reach much better price per
performance or number of tokens used
than the previous 4.1 opus um I was at
some point where I was just using um 4.1
opens for the planning or the more
complex tasks and using 4.5 set or the
previous models for the actual work cuz
they have, you know, a very large uh
context window. They were cheaper, they
were faster. I think they've been making
some optimizations in terms of things
like pricing and performance as well for
Oppus 4 lat 5 and it feels you get a bit
more bang for your buck than the
previous versions of um you know for lot
one opus
>> and can you talk a little bit more about
how you know they've achieved that those
uh a lower pricing you know are there
sort of innovations in how they're
approaching things that have helped them
um to do that
>> I'm not sure it's necessarily to do with
innovation maybe it just has to do with
more availability of the hardware. You
know, they've recently announced some
very, very strong partnerships with both
u Microsoft Azure and with uh Google to
use, you know, more GPUs, more TPUs. Uh
I think part of it just has to do with
having more availability of the
infrastructure and being able to reach a
bit further and say, hey, we're putting
some of our best models out there as the
default. I was actually quite surprised
and maybe even shocked that when I've
opened up
uh cloud code, it recommended 4.5 opus
as the default, which is somewhat
unusual. It might say something like,
"Hey, we're going to use for.1 or we're
going to use for 5 for uh thinking and
then we're going to transition into 4.1,
but outright they've launched a new
version of uh cloud code and it used for
five opus as a default."
>> And I guess it's hard to predict, but
how long does Anthropic have the lead?
you know, we've had is it 48 hours? Will
we have something new by um by next week
or you know, does this set itself apart
such that you'll be using it yourself at
least as your top choice for at least
the coming weeks?
>> I think the way I see it and I'm I'm
using all three models. I'm using Gemini
for a lot of my deep research, my
research, my advisory. I'm using codeex
sorry cloud code with um opus for
writing code or for you know writing
test cases and I'm using uh codeex with
GBD for reviews or for anything else uh
after so I'm actually using all three
but I would say by number of tokens
it's still the entropic models they're
I'm using the most tokens with uh
they're writing the vast majority of the
code for me because they perform the
best for this particular use case. uh I
don't see it necessarily as a
situation where if I had a different use
case I would still use the mouse from
from entropic if it was you know u
summarization or content generation or
creative thinking or creative writing u
maybe I would lean more towards GBD5.1
pro um but at least for in the area of
code it could also be a combination with
what they put in uh the cl code tooling
it still seems to outperform
um codeex at least for my use cases or
from my my personal experience.
>> Do you think for that specific area of
coding does this sort of have enterprise
application or usefulness or you know do
you think that's part of the the play in
trying to bring the cost down to make it
sort of um more enterprise friendly or
yeah how do you see that? Yeah, I think
the strategy of making this available
through the various hyperscalers at um I
would say at a reasonable cost is going
to help with enterprise deployments
because many of the enterprises are
never going to consume the models
directly from the provider. So you know
you're not going to go off and consume
this on if you're using models from open
AAI you're going to consume them likely
on Azure from Microsoft not necessarily
directly from chat GPT. Uh same thing
goes with cloud. You're going to consume
it through maybe AWS bedrock, but there
has been somewhat of a limiting choice.
Uh for enterprises with its availability
through Microsoft Azure, for example,
this has really opened it up to to
enterprise customers and uh uh
consumers.
>> Is there anything else about the model
um you know that's worth worth talking
about that we haven't covered yet that
sort of strikes you as interesting?
Yeah, I think um clearly they've
optimized it for agents, they've
optimized it for computer use, they've
optimized it for coding, but I like the
fact they've also optimized it for
things like building PowerPoint slides,
which was maybe at um something that,
you know, even Microsoft was looking at
these models for for use in Office 365
or, you know, PowerPoint generation,
slide generation. So, they're not
looking just at software development use
cases. They're now starting to tackle a
lot of other enterprise use cases. you
know being the best model for generating
a PowerPoint slide or generating a word
document of generating and working with
you know XML or working with the schema
uh required to build those documents. So
I I I'm I'm pleased and happy to see
that models are being optimized for
these enterprise use cases.
>> I'd be happy to have a model uh to do my
PowerPoint slides as well.
>> Yeah, 100%
>> to optimize. Thank you Mihi so much for
joining our conversation. And now we're
going to return to our special
Thanksgiving episode. Happy holidays
everybody.
Thanksgiving, I think, is like a really
good time to be talking about agents
because, of course, agents have been
very much hyped in 2025, but Agentic
Commerce has been one of the things in
agents that people have been excited
about. And you know, this week is going
to feature Thanksgiving, but also
importantly, Black Friday, which is one
of the biggest shopping moments of the
year. And so, I guess maybe Chris, I'll
kick it to you first. you know, do you
think this week is going to be breakout
moment for agents in Agentic Commerce?
You know, why or why not?
>> No, I don't think it will be. I think
we're probably another year away from
that. Um, why not is that I think all of
the ingredients are getting in place. So
if you think about what OpenAI's done,
they're now bringing on board the
ability to shop in their channel for uh
commerce products and they partnered
with uh you know uh you know Shopify
etc. But there's so many commerce
retailers that they've not on boarded
yet. So I think that's really early and
it is US only at the moment. And then
Google has released their uh agent
commerce protocol and again that's
really early at the moment. So I think
we're and agentic browsers haven't quite
taken off yet. So I I I just think we're
about a year away from that. Now what
where where I do think it's going to
become relevant is utilizing web search
and deep researchers from within chat
GPD to find the products that you want.
That is going to be big and that is
disrupting retailers. But I I I don't
see a massive effect on Black Friday
this year. And what's sort of
interesting and I'd love to kind of
parse that out a little bit more is I
guess Chris you've listed like a couple
key components right it's almost like
the agentic browser is not quite there
the uh you know the the like
partnerships are not right there from a
business standpoint I guess Lauren do
you kind of agree with this assessment
do you think like you know this week is
going to be big for Aenta commerce I
guess you know Chris is almost saying
like I guess there is some in so far as
people are using it to find products but
um it's it's not obviously what we were
promised in the in the the the exciting
early days of 2025.
>> Yeah, my feeling is it's also not going
to be so different from last year. So,
the you know, the protocols that Chris
talked about will help in automating the
actual checkout. Once you're using chat
GBT and you find the thing you want, you
can automate the checkout. But I'm not
sure that was really ever the biggest
problem. You know, I didn't have a big
problem putting in my credit card
manually once I get to the link. They
spent a lot of time making it easy to
spend money on the internet.
>> Yeah.
>> I mean, we've had automated checkout for
a while. It was very hacky. You know,
that's why it's hard to get concert
tickets is because it is possible to
build browser automation to buy things
automatically. So, I don't see a big
revolution coming from the simple act of
being able to check it out once you're
in the AI application. And I do think
too that even the product research
capabilities are a bit underwhelming.
You know, when you know you're looking
for something specific like size,
dimensions or a style or something, it's
not it's not always easy to find that
just through the kind of interface that
we have now. And I think a lot more work
could be done on both training AI models
on e-commerce relevant information. So
you know what are the input output pairs
of input this was the customer intention
output this is what they ultimately
selected and you know allow the AI model
to build those pattern recognition of
okay you know that intention really
meant this size of things or this
configuration so I think there's
definitely work that could be done on
that front of having the models
themselves better uh working better for
e-commerce and then I think there's also
you know when you fit those models into
a aent authentic pattern. How do you you
know prompt it? How do you build in the
steps of that process? So you know like
first by start start by identifying the
retailers you want to look in then you
know get the information you need on
their products then compare them like
that's a whole flow that using a general
purpose um chatbot or you know chat GPT
or whatever pick your system is not it's
not made specifically for that. So I
think if we had more workflows that were
built specifically for that, the
performance of those would go could go,
you know, through the roof that you know
when you have an intention, you can get
a specific link to that product right
away. So I think that's really where I'd
like to see more improvements.
>> Yeah, definitely. Vulmar, I'd love to
bring you in because I think your angle
of this is really interesting. When we
talk about agents, obviously we tend to
talk about like higher up in the stack,
right? It's like the application's not
quite there. Even like the business
partnerships are not quite there to get
this to work. Is there a hardware
limiter to the world of agents really
taking off particularly in commerce but
I guess otherwise or is this not like
almost not even in the picture? So I I
would take a completely different stand
right I think it is actually the year of
agents and the reason is very simple um
if you look at black Friday 15 to 20% of
the stuff gets returned right and so I
think the agents are not the consumerf
facing agents but the agents are
actually the back end and I think this
is where the true adoption happens where
stuff you know people are returning it's
the same after Christmas like you know
it's like statistics like it's unclear
somewhere between 15 and 25%. It depends
on the product category. So if you look
at Amazon today already um you know if
you want to return something it used to
be like you know they they click five
buttons and then they're like okay good
ship it back or no we reject it and you
had to make a phone call. Now that's all
done through agentic workloads. And so I
think from from the big retailers and
probably not I mean Shopify at some
point will offer it as well but the big
retailers uh are already in that motion
of actually optimizing that backend
flow. I do not know what they are doing
when the product hits you know their
their shipping center. um if they have
agentic workloads there I'm sure they do
but that first customer touch point uh
effectively doing like the return I
think that's where the the majority of
the labor is on their side because the
the front end is very optimized so um we
just don't see it as a consumer but we
see it indirectly because actually
return is easier
>> yeah definitely and I think this is kind
of a pattern I did want to talk about I
mean just zooming out from e-commerce
right or like you know buying stuff
online is I think it's almost easy as
like a consumer to be like agents are
the dog that didn't bark in 2025 because
yeah most of the people I know are not
using agents every single day for all
sorts of things but it does seem like on
the back end on the enterprise there is
a lot of agent activity and so it's sort
of interesting is that kind of like the
public face and public experience of
agents is like very underwhelming
whereas you know Vulmar there's like
stuff under running under the hood for
like returns which are like very much
identified um and so we have this kind
of split screen that's happening in in
agents that might actually fool us about
just how far this thing is going.
>> And I think that I mean if you look at
programming agents it's pretty
complicated you know what you need to do
and you need to coers it into doing the
right thing and so I think the consumer
will always uh consume agents indirectly
through products and so it's not like
you know I mean we have like these apps
on the phone where I can automate stuff.
I don't know I I have one automation on
my iPhone which is like when I'm driving
close to our community open the gate.
Okay, that's the only automation I have
out of all the automations I could
build. And so humans typically like you
know they want to have a packaged
product which just solves the problem.
And I think this the beauty of what CHPT
did is giving you that that one line
everybody knows how to use from Google
over 20 years. Um and it's it's really
easy to consume and now they're building
similar to Google all these capabilities
in. So I think that's how we will as a
human we will consume agents. Um but
then in the enterprise no you take your
business process and where every every
place you have a human you can actually
try to put an agent and so I think we
will see adoption indirectly but not
directly now is it the year of the
agents I think we are still in the PC
phase and some companies and so in that
sense I just want to have a contrarian
opinion I think to a certain extent um
the it is probably the year of the agent
PC's let's call it this way
>> okay yeah pilot agent Yeah, the pilot
agents. Yes.
>> Yeah. Chris, go.
>> Yeah. No, it's it's the year of the
agents. I don't you know, I disagree
with Mar. It's the year of the agents.
So,
>> you don't need to caveat that. Uh you
say, yeah,
>> it just is, right? I mean, we we need to
think about this for a second, right?
So, if we
>> look at what has happened with chat GPT,
right? And we'll we'll start from there
and then move outwards. But um
integrated both in it claw Gemini you've
got web search capabilities which is
which is tool calling you've now got the
model catalog so you can hook up things
like your Jira your you know and pretty
much anybody who's got a service uh on
the internet you can hook up as a
connector now and that is basically tool
calling everybody's offering deep
researcher which is a agent behaviors um
and then probably the biggest star of
them all is going to be the coding
agents right that's just went crazy
especially things like claude code if
you think about things like lovable etc.
um everybody uh you know codecs
everybody's using codec uh coding agents
uh to get their work done and the
biggest thing that's made the difference
there is given access to tools so I I
think agents are here and as I said at
the beginning of the year year to super
agent is that um the fact is that with
planning and reasoning these agents have
became really really capable so I I
think uh it can still feel PC like
because everything's maybe not agent
agent agent in the way that you think,
but we are all pretty much using agents
every day. We're just not thinking about
them in that way.
>> Yeah. And I guess they're not weaved
together in kind of like the cohesive
experience that we've been promised,
right? Like Chris, you began by saying
like, well, it's not like Black Friday
like agent commerce is really going to
be happening everywhere because all
these pieces are still missing. They're
they're there, but they just haven't
been kind of orchestrated in a certain
sense. Um, Lauren, I I always joke, you
know, the agentic consumer demo is
always like, you need to book a trip and
he's like, push the button and the trip
is booked. Um, and I guess kind of what
Vulkmar is saying is like that that
isn't that isn't happening. Might take a
really long time to happen. Do you think
it eventually will like will we get to
the much more kind of like consumerry
agentic experience, right? That like I
think that is the source of all these
like splashy videos and startups that
people are working on. Um, or is that or
is that kind of I mean Vulmore what I
heard you saying is almost a little bit
like the future may not actually look a
whole lot like that just because of all
the things you need to package and so
like all the agents might always be kind
of a little bit in the background but
I'm curious about kind of like how far
to the consumer you think the age agent
experience will look like.
>> Yeah, I think the trajectory of just
LLMs standalone is a really interesting
one to compare this to. So LLMs we had,
you know, 2017 the Transformers paper.
Um 2018 was the year that we got BERT
and was the year that we got GPT1
and then 2022 was when we got those
things available to the end consumer in
a very very easy way like in the form of
a web app or a mobile app. So I feel
like where we are with agents is maybe
that 2018 like you know not purely
research paper level but still not in
the hand not 2022 in the hands of every
single person. You know we have like our
GPT1 BERT kind of demos and things to
look at and then I think the big
question is will it take four years to
get into the hands of everyone like it
did with LLMs? You know, there's
definitely reason to think that across
the board, these timelines are
accelerating. So maybe it could be less
than four years. We have way more
attention and investment in this
technology than we did with, you know,
there was just low awareness amongst
the, you know, community of investors
and people who are going to nudge this
along back in 2018 of LLMs. Um, so could
it be faster because of that or could it
be longer because maybe it's a lot going
to turn out to be a lot more complicated
than getting LLMs into production and
into everyone's hands. So
>> yeah, it's like I I do like the idea
this like background hype on AI makes
all downstream AI things happen faster
because everybody's paying attention to
it now. Um, and I guess Lauren, what I
think this is actually one thing I did
want to ask you a little bit about is
that like a big part of this
acceleration is whether or not it's easy
for people to develop agentic platforms,
tools, applications. And do you want to
give just kind of a flavor of the state
of the developer ecosystem right now?
Because I feel like that's a critical
thing. I mean, I think Vulmar said a
moment ago, right, like getting these to
work still takes a lot of work, right?
And so I feel like that in some ways
limits our progress just because like
the number of organizations and people
who can actually do this is like small.
And so one way you increase progress is
you just make it easier for people to
develop for it. And so interested in how
you see the developer ecosystem around
this uh evolving.
>> I think it's a really really fun time to
be a developer if you want to try and
you want to experiment and you can I
mean you can do that no code. There's
things like lang flow that you I mean
it's visual to build an agent and drag
and drop. That's super cool. that helps
you not waste a lot of time coding
something that ultimately you know the
data is just not there or the LLM just
doesn't understand all the way to like
the pro code there's lane chain lane
graph crew AI autogen semantic kernel
there's I mean your choice of things and
some are a bit more easier and
abstracted to use some give you full
control if you want it so I think if you
want to try you have all of the tools to
do that that should never be the problem
I think if you want to actually deploy
deploy that and take it out of, you
know, a very tightly controlled
environment with a very, you know,
precisely specified use case, which is
probably book a trip. Like you said, if
you want to ever, you know, expand
beyond that, actually have it hosted
somewhere, somewhere where you could
invite your friends to try it, that's
where it immediately gets very
complicated and there's far fewer just
obvious options of what you're going to
use. You know, you might have to I mean,
right now it would probably make sense
to you want to deploy an agent and have
it hosted somewhere. You have to figure
out where to host the agent logic
itself, which is not really LLM type
workloads and then a separate
environment to host the actual inference
and then patch those two things
together. So, it's really not ideal. So,
I think you know actually scaling up,
sharing, hosting what you build is the
hard part. So I think it's also one of
the inhibitors right so if you look at
it right now we don't have this packaged
all happy solution and it's the the
entry barrier we are not at the Shopify
level right where the you know a mom and
pop shop can say hey I want to have an
agent and it should you know deal with
something um you know and I think the
the there are some projects we have in
IBM where we take the the um like the
flows and the business um like
description and we're converting that
straight up from English into like you
know an a lang flow and so that
transformation when we are at the point
that you can actually use English to
describe what problem you want to
automate um and not know anything about
programming I think then you get uh you
get it to the masses and then you can do
it on a cell phone right it's like hey
when I come home I want the lights to be
on and not like you know build
automation you know when someone's like
it's it's the the the interface is just
like really it's it's um it's a baby
programmer interface for people who can
program and that's why nobody uses it
right but I mean there there's a logic
and and I can describe that logic in
English so now you need to be very
explicit but I think the models can
already fill in the gaps they are smart
enough for that if you can get to a
point similar to you know right now we
are doing English to code if we can get
English to agent then we are at a point
that it's mass consumable and right now
the interface is are still it's still
built for programmers. It's not built
for consumers.
>> Yeah, that's right. And I feel like that
vision almost shortcircuits it which is
like well do you need a developer
ecosystem, right, for a whole set of
applications? Um which I think is pretty
pretty interesting.
>> I think it's pretty obvious that you
know someone like I always use Shopify
but Shopify was this you know if you
look in the 2000s it was like oh my god
you can you can run a web server on the
internet that's amazing right? So I can
build a billion dollar business and then
Shopify came along and just in fact we
democratized this. We are not yet at the
point it's still high-tech. It's not
democratized. But it's just a question
of time that someone wraps it and says
okay you know I make it really easy and
and that easiness once you have that and
you the complexity goes down by a factor
of 10 or 100 then it will be then you
know everybody will use it because
otherwise you die and I think there will
be an an integration in already these
type of commerce applications. Um, and
so the moment someone figures this out,
well, it will just be wildfire. But I
think that pivotal moment hasn't
happened. The Shopify moment for agents
hasn't happened.
>> Yeah. There's almost a tension between
these kind of two pathways. It feels
like where one of them is Vulmar where
you're talking about which is like
language to agent. If we got that really
good and really powerful, then you
almost don't need to build a lot of the
kind of like deployment infrastructure,
I guess, in some ways that like Lauren
you're talking about, right? which is oh
we've got this like prototype we're
building and then now it's got to be on
some kind of rails for us to like make
it more available. There's kind of a
vision for I guess Vulma where you're
talking about which is like the consumer
just simply types in what they want and
then it it happens basically. Um, I I
guess Chris, maybe to bring you into the
conversation, I think going to kind of
what Lauren is saying about like, okay,
right now there's lots of ways of kind
of like prototyping an agent, but the
minute you want to do anything more
complicated or to scale it, there's just
this like gap in the space. Um, do you
have a sense of kind of like what's
necessary to sort of mature that right
now? Is like we're I guess we're still
waiting on the companies and platforms
that are going to make that happen.
>> Yeah, I I think so. I mean I I think it
is to to take things to your point from
from P and MVP to scale is a hard
problem because you know consumers do
crazy things right so you start to have
to say well am I am I putting the LLM
right in front of the consumer and if
you are at that point then you need to
guard rail it and that could be things
like guard models it could be running
you know deterministic flows in
conjunction with the AI to keep it on
track to Vogmar's point about text to
plans if you look at something like
claude code if you look at something
like cursor wind surf etc almost all of
these things have a built-in planner and
and so when you ask a question the first
thing that happens is it goes to the
planning module for anything complex and
then the model is kept to the plan and
you see that you know we talked about
manis early in the year same sort of
thing right you ask a task it goes the
planning module the planning agent uh
kicks in creates the plan and then the
agents execute to the plan. And and
there's a good reason that exists, which
is if you give an LLM and agent uh a big
list of tools, who knows what tool it's
going to pick, right? And and and and my
favorite one at the moment for this is
the Kimmy K2 model, right? I love the
Kimmy K2 model. It can call 200, 300
tools. It has a long sequential range of
tool call and it can do a massive amount
of tools. But you know what it is? You
give it a tool, it's gonna call it,
baby. You know what it's like? It's like
every tool that it's got, it's like, I
will do it this way. I will do it that
way. It goes off the rails. It's in a
phenomenal model, but it goes off the
rails because it can't keep itself on
track. And then even when you're
executing to the plan, quite often the
models will either use its own memory or
it will not even bother updating the
progress. So, it will be like, "Oh, no,
no, no. I know the answer to this." And
then just answer it, right? as opposed
to no I I need you to use the tool.
Right. Your your information that that
that you've got isn't enough. I need you
to use a tool. And it's like no no no I
know I know this. I know this. And then
gives it Yeah. Exactly. And it's like
you don't got it. I do. I do. I do. And
then and then and even if it does that
once it's done the task when you're
following a plan you want to go executed
step executed step executed. And again
the model if if you're not deterministic
and if you leave the model on its own it
will skip steps in the plan or not even
update it. Right? So actually to that
point about frameworks when you want to
start to get to production that's where
those sort of frameworks become really
important in place but but the reality
is that there's not a lot of you you you
then back to a developer mindset to be
able to put those frameworks in place to
deploy. They're not out of the box. I
think I think when we see that being a
mass thing, those frameworks are either
just going to be part of the platform
and ecosystem you deploy your code onto
it or that's going to be solved at the
model level.
>> So I guess in the last few minutes I
want to talk a little bit about sort of
we've been talking about technically
what needs to happen for 2026 I guess to
be the the the real year of the agent.
Um I'm interested a little bit in sort
of like uh winners and losers and kind
of platforms here, right? Um, you know,
I I guess the question is like are the
winners in agent land from a platform
standpoint going to be the winners in AI
in general, right? Like is it going to
be, you know, open AI and Enthropic that
end up dominating the kind of like
agentic ecosystem? Will it be, you know,
some of the maybe the cloud players that
really end up doing this? Uh, I don't
know if anyone here has any strong
priors on like who's well positioned to
kind of really be the the major platform
for this space.
>> I think there are two questions to
answer. So one is what model or what
model zoo do I need to use um to
actually get good results and what Chris
just said was you know that these things
go off the rails and so you need to kind
of babysit them into giving you the the
right answer. I had a case where I you
know I tried to program something and it
had an API call and the API call didn't
work and so in the end the model just
decided like oh I I just stub it and you
know I just call my own function and
it's like look I'm done it works right
and didn't do anything anymore. So it's
like the solution is don't do anything
and then I'm good and it's like success
congratulations. So I think there's a
the whole like how do we manage the
model and uh that's um that's a hard
problem in itself right um building a
model like right now we are the state of
the world is you need these frontier
models because otherwise the reasoning
capabilities are not not you know not
prevalent enough now I think that
probably next year we will see people
just building like planning models like
you just focus on one thing get the
planning right and then of course the
the models underneath to execute the
plan and not go off the track and right
now I don't think we have done that. Uh
and so I think the the the frontier
models are really the only place where
it can go right now but of course with
humongous cost associated to it. So so
we will see um like the smaller models
being specialized for planning. Um the
uh I think the second question is how
you execute this and where you execute
this. And I think that's a that's a
really good question. My belief is and
this where I'm also taking uh like the
our product is AI is everywhere. There
is no place where AI is not right. So
the idea that we are like oh we're just
putting a bunch of you know H100s or
H200s in the data center and that's
where all the AI will happen. That's
just not true. Like we will see
pervasive application. It will happen on
your cell phone. It will happen in the
data center. And so um the real trick
here is who can make those agents
cost-effective because in the end right
now the work is done right in in
Portugal in in business scenarios um the
work is done by labor right there is a
person who is currently doing it by hand
and what we are hoping for is that the
agents a replace the people who do it by
hand so that the people who are doing by
things by hand can do better things. Um
and then the other one is uh we want to
have agents in the hands where right now
the work doesn't get done at all. right
or poorly. So I we want to get more
choices. And so that one is really now a
cost optimization problem. And so I
think there's an industry at the bottom
of like we need to efficiently run that
capacity that infrastructure so that we
bring down the cost of these agents you
know by 10 or 100x and if we hit that
then it will be pervasive. Right now we
are using it primarily for like high
value tasks which are incredibly labor
intensive right or which are very very
controlled. So I can actually say I have
you know thousands of people doing this
but I can put an agent behind it um
because it's a confined enough problem
space that I can supervise and watch it
and so the moment these things get more
powerful and we bring the cost down I
think then it will be a more pervasive
application.
>> Yeah that's right. I like seeing that
kind of like almost like the market sort
of dividing between sort of maybe like
the existing frontier AI model companies
going more agentic and like that's kind
of one part of the market and then
there's a whole kind of like cost
efficiency universe that kind of emerges
and they they also like you know the
frontier model companies might also get
into that but it's also like maybe looks
like a very different kind of market and
a very different kind of ecosystem. Um,
Lauren, I'm curious about how you divide
up the future agentic market. Is it, you
know, one model model's to rule them
all? You know, is it uh I just kind of
curious there's many ways this could
play out and I'm sort of interested in
how you how you forecast here. Yeah, I
think whoever can make something
repeatable will win because it really
feels like this moment of agents right
now is like traditional AI 10 years ago
where it was really cool that you could
build an AI model to do anything, but
you had to do that from scratch. So you
wanted it to predict education outcomes,
you had to find the data, train the
model just for that, refine it, and then
package it up and use it. And then if
you wanted it to do something else, you
had to start over. It was a whole
endto-end process every time. And that's
kind of what agent building is right
now. And it's even more painful because
it's not just code based, it's language
based. So so much of that rebuilding is
like prompting it and figuring out how
to nudge it in certain directions and
get it to use tools sometimes and not
other times and use the tools in better
ways. So I feel like if there
I mean the breakthrough with traditional
AI was foundation models. We then
trained bigger better models because we
could we had more data we had more
compute and then that one model could do
different things because it knew a
little bit of everything. I think if we
similarly had some concept of foundation
agents it could work similar um and then
you know kind of reduce that friction of
having to build from scratch every
single time.
>> Yeah for sure. And you think the winners
that are best positioned I guess there
would be the existing leaders right like
I guess they take their model polish it
off and then it's the foundation at
Gentic you know model basically
>> and I don't even think it would be model
at this point it would be orchestration
of multiple models plus other um you
know constraints put on that
>> um I really don't know I don't know if
it would be the existing leaders or is
it going to be some dark horse that
builds one agent initially builds one
agent to do one specific thing but then
takes the pieces of that you know
whatever percent of the code and uses
that to do a second thing and then a
third thing and then eventually because
that was kind of the AWS story right is
they were building for themselves to do
something specific initially but then a
lot of that cloud infrastructure could
be used for other things beyond that so
I think there is a scenario where that
happens where someone just commits to a
use case and initially you know they're
kind of looked down upon because they're
using AI to just do one thing but do it
well, but then they realize what the
pattern is to expand to other things and
eventually build something that's more
repeatable and more of a platform.
>> Yeah, that's very rich. I never really
thought about that as basically like
what's the specific agentic problem that
if you solve unlocks the largest number
of subsequent agentic problems and it's
kind of interesting to think about like
is that is that the travel planning one?
Like what what what is that use case?
So, um Chris, do you want to give us a
final thought here before we close up? I
don't think this is a finite game. I
don't think there is a winner. I think
this is the classic Simon Sync infinite
game. I think um the players are going
to play until they run out of resources
and they can no longer play the game.
And I think that's that's that's what's
going to happen, right? So I I think a
lot of the technology and the techniques
is known. I think it's well known across
the world and the limiting factor is
resource. But in an agentic world, the
the models need to get smaller and
smarter and they need and in the future
be able to fit on a chip, right? And and
therefore I I just don't think there is
a winner in this scenario. So who do I
think is going to win? I think we're
going to win, right? I think it opens up
a creator ecosystem. What I hope that it
breaks up is all these kind of web 2
massive companies controlling everything
and we can get a a more uh surrounded
marketplace. That's what I believe in.
And and and the biggest thing if I think
about this is what what I think was
going to happen for 26 and 27 is you
remember the Rick Rubin episode where I
was frantically uh googling who Rick
Rubin was so and I could give an
intelligent answer to your question,
Tim. Um I'm obsessed with Rick Rubin at
the moment because actually I think
composition is where we're going. I
think 26 and 27 is going to be about
marketplace, but I think it's going to
be about being being producers. And
you're going to say, "Okay, I've got
this model over here and I've got my
piece of data and I've got my brand and
my style and I've got these five tools
and then I'm going to combine them
together into my ecosystem and I'm going
to create something new and beautiful
and therefore and that's going to be my
product." And and so I I hope that's
what happens is that it's not this this
one model or whatever is is the winner.
That's a depressing future. What I'm
hoping is this vibrant, amazing
ecosystem and marketplace where
everybody's got a chance to use AI to
improve their lives, personalize it to
them, and create their own company's
products and data without the
limitations that we have today. So,
we're going to be the winners. And but
but I I think these model providers,
they're going to come and go, right? And
we saw that this year. Who you know who
who was Moonshot and Kimmy, right? You
know, ask that question six months ago,
right? And then and then if we go back
last year, who was DeepSeek, right? Same
sort of thing. And and new new model
providers are going to come in. And you
remember when we were super excited
about Manis again? I expect them to come
back at some point. Just people are
going to come in and out and it's and
it's fine and it's okay. But uh yeah,
who knows for the future, but it's going
to be fun.
>> Nice. Well, on that hopeful note, I'm
going to let you all get to your
impending holidays. Uh Vulmar, Lauren,
Chris, thanks for joining us. And thanks
to you listeners for joining us. Uh, if
you enjoyed what you heard, you can get
us on Apple Podcast, Spotify, and
podcast platforms everywhere. And we'll
see you next week on Mixture of Experts.