AI Takes Over Hollywood
Key Points
- The panel speculates that by 2030 most summer blockbusters will be fully computer‑generated, with mixed hopes that traditional filmmaking—especially directors like Tarantino—will still survive.
- Guests Marina Danilevsky, Abraham Daniels, and Gabe Goodhart share contrasting views: Marina is upbeat, Abraham worries about losing real actors, and Gabe hopes AI‑generated animation still involves practical effects like bodysuits.
- Host Tim Huang introduces “Mixture of Experts,” previewing topics such as the “end of Stack Overflow,” a new project called llm‑d, and Microsoft’s NLWeb release.
- Google I/O’s headline AI announcements are highlighted, including a $250 “AI Ultra” subscription tier and the launch of VEO 3, a text‑to‑video (and now audio) generation model.
- Abraham expresses skepticism that high‑quality, AI‑generated video can replace live‑action filmmaking, citing current limitations on video length and overall realism.
Sections
- AI‑Generated Summer Blockbusters 2030 - Experts humorously predict that by 2030 most summer blockbuster films will be entirely computer‑generated, while yearning for some live‑action elements amid a broader discussion of AI news.
- Evaluating LLM Subscription Value - The speaker questions the practicality and consumer appeal of high‑priced LLM subscriptions amid emerging open‑source alternatives and a challenging market environment.
- Open Source Unbundling the AI Market - The speaker compares the emergence of open‑source AI models and frameworks to early streaming services that dismantled bundled software, suggesting a competitive split between paid, bundled solutions and open alternatives will define the industry's future.
- Search as Baseline, Google’s Momentum - The speaker argues that reliable search is a fundamental requirement for any agentic AI framework and wonders whether recent events like Google I/O signal that Google is narrowing the competitive AI gap.
- AI Threats to Stack Overflow - The speaker cautions that generative AI could undermine Stack Overflow by replacing nuanced human expertise with homogenized answers, concentrate fresh knowledge in proprietary tools, and proposes anonymized AI‑generated contributions to preserve the platform’s knowledge base.
- AI Content Shift & Collaboration - The speaker notes how AI tools are siphoning traffic from traditional content creators, argues for shared high‑quality AI‑generated answers, and envisions subscription‑based platforms that curate expert‑driven responses.
- AI Threat to Stack Overflow - The participants debate how rapidly improving coding AIs are diminishing Stack Overflow traffic—accelerating a pre‑existing decline—and prompting developers to migrate to Discord, WhatsApp, and similar chat platforms, raising questions about the future relevance of the site and its SEO impact.
- Introducing LLM-D: Kubernetes Inference Stack - The speakers unveil the open‑source, Kubernetes‑native llm‑d platform for distributed LLM inference and argue that as models become commoditized, developers will select them based on ecosystem support and performance‑to‑cost ratios.
- Prefix Caching and Request Routing - The speaker explains how reusing pre‑computed token prefixes and intelligently routing requests to servers that already hold those prefixes minimizes redundant computation and maximizes GPU utilization, a strategy aimed at improving LLM serving efficiency especially for constrained enterprise environments.
- Red Hat's Open‑Source Support Play - The speaker argues that open‑sourcing complex LLM technology creates a market for Red Hat to monetize by offering enterprise support, mirroring the proven business model of Kubernetes.
- Unified Agent Protocol for Web - The speaker debates conversational interfaces becoming dominant, explains the MCP server concept that makes website content discoverable to agents via a standardized data handshake, and stresses that a unified protocol—not just chat—will enable search, scraping, and actions across the web.
- Toward Bidirectional AI Content Interaction - The speaker argues that while a simple chat‑window UI and a unidirectional MCP protocol can conveniently expose site content to AI agents, true value requires a two‑way interaction model that lets sites act not just as data providers but as interactive AI applications.
- Anti-Agent Strategies and Open Protocols - The speakers debate emerging anti‑agent tactics, the need for incentives that encourage creators to share content with AI, and whether open standards such as HTTP or a new MCP/Tim Context protocol will become dominant over walled‑garden alternatives.
- Coalescing Web Protocol Evolution - The speaker contends that emerging web protocols will succeed only if they combine solid creator‑side technology with reliable, user‑friendly consumption—potentially aided by AI and driven by engineers’ impatience with flaky implementations.
Full Transcript
# AI Takes Over Hollywood **Source:** [https://www.youtube.com/watch?v=rNk3OuUj1UA](https://www.youtube.com/watch?v=rNk3OuUj1UA) **Duration:** 00:41:55 ## Summary - The panel speculates that by 2030 most summer blockbusters will be fully computer‑generated, with mixed hopes that traditional filmmaking—especially directors like Tarantino—will still survive. - Guests Marina Danilevsky, Abraham Daniels, and Gabe Goodhart share contrasting views: Marina is upbeat, Abraham worries about losing real actors, and Gabe hopes AI‑generated animation still involves practical effects like bodysuits. - Host Tim Huang introduces “Mixture of Experts,” previewing topics such as the “end of Stack Overflow,” a new project called llm‑d, and Microsoft’s NLWeb release. - Google I/O’s headline AI announcements are highlighted, including a $250 “AI Ultra” subscription tier and the launch of VEO 3, a text‑to‑video (and now audio) generation model. - Abraham expresses skepticism that high‑quality, AI‑generated video can replace live‑action filmmaking, citing current limitations on video length and overall realism. ## Sections - [00:00:00](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=0s) **AI‑Generated Summer Blockbusters 2030** - Experts humorously predict that by 2030 most summer blockbuster films will be entirely computer‑generated, while yearning for some live‑action elements amid a broader discussion of AI news. - [00:03:07](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=187s) **Evaluating LLM Subscription Value** - The speaker questions the practicality and consumer appeal of high‑priced LLM subscriptions amid emerging open‑source alternatives and a challenging market environment. - [00:06:13](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=373s) **Open Source Unbundling the AI Market** - The speaker compares the emergence of open‑source AI models and frameworks to early streaming services that dismantled bundled software, suggesting a competitive split between paid, bundled solutions and open alternatives will define the industry's future. - [00:09:22](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=562s) **Search as Baseline, Google’s Momentum** - The speaker argues that reliable search is a fundamental requirement for any agentic AI framework and wonders whether recent events like Google I/O signal that Google is narrowing the competitive AI gap. - [00:12:23](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=743s) **AI Threats to Stack Overflow** - The speaker cautions that generative AI could undermine Stack Overflow by replacing nuanced human expertise with homogenized answers, concentrate fresh knowledge in proprietary tools, and proposes anonymized AI‑generated contributions to preserve the platform’s knowledge base. - [00:15:30](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=930s) **AI Content Shift & Collaboration** - The speaker notes how AI tools are siphoning traffic from traditional content creators, argues for shared high‑quality AI‑generated answers, and envisions subscription‑based platforms that curate expert‑driven responses. - [00:18:41](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=1121s) **AI Threat to Stack Overflow** - The participants debate how rapidly improving coding AIs are diminishing Stack Overflow traffic—accelerating a pre‑existing decline—and prompting developers to migrate to Discord, WhatsApp, and similar chat platforms, raising questions about the future relevance of the site and its SEO impact. - [00:21:48](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=1308s) **Introducing LLM-D: Kubernetes Inference Stack** - The speakers unveil the open‑source, Kubernetes‑native llm‑d platform for distributed LLM inference and argue that as models become commoditized, developers will select them based on ecosystem support and performance‑to‑cost ratios. - [00:24:51](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=1491s) **Prefix Caching and Request Routing** - The speaker explains how reusing pre‑computed token prefixes and intelligently routing requests to servers that already hold those prefixes minimizes redundant computation and maximizes GPU utilization, a strategy aimed at improving LLM serving efficiency especially for constrained enterprise environments. - [00:28:03](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=1683s) **Red Hat's Open‑Source Support Play** - The speaker argues that open‑sourcing complex LLM technology creates a market for Red Hat to monetize by offering enterprise support, mirroring the proven business model of Kubernetes. - [00:31:09](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=1869s) **Unified Agent Protocol for Web** - The speaker debates conversational interfaces becoming dominant, explains the MCP server concept that makes website content discoverable to agents via a standardized data handshake, and stresses that a unified protocol—not just chat—will enable search, scraping, and actions across the web. - [00:34:10](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=2050s) **Toward Bidirectional AI Content Interaction** - The speaker argues that while a simple chat‑window UI and a unidirectional MCP protocol can conveniently expose site content to AI agents, true value requires a two‑way interaction model that lets sites act not just as data providers but as interactive AI applications. - [00:37:14](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=2234s) **Anti-Agent Strategies and Open Protocols** - The speakers debate emerging anti‑agent tactics, the need for incentives that encourage creators to share content with AI, and whether open standards such as HTTP or a new MCP/Tim Context protocol will become dominant over walled‑garden alternatives. - [00:40:16](https://www.youtube.com/watch?v=rNk3OuUj1UA&t=2416s) **Coalescing Web Protocol Evolution** - The speaker contends that emerging web protocols will succeed only if they combine solid creator‑side technology with reliable, user‑friendly consumption—potentially aided by AI and driven by engineers’ impatience with flaky implementations. ## Full Transcript
It's 2030 are the majority of summer blockbuster films,
entirely computer generated.
Marina Danilevsky is a Senior Research Scientist.
Marina, welcome back.
What's your prediction?
Summer blockbuster films may be submissions to Cannes, no.
Okay,
Cool. Sounds great.
Abraham Daniels is a Senior Technical Product Manager for Granite uh, Abraham.
Welcome back.
We haven't seen you in a while.
Welcome back to the show.
Uh, what's your prediction?
Uh, I really, really hope not.
Um, I'm hoping that Tarantino still makes movies and he only
does film, so cross my fingers.
Keeping it strong.
And Gabe Goodhart is Chief Architect AI Open Innovation.
Gabe, welcome back! Movies in 2030 -
what do you think?
Uh, I only watch animated movies with my kids at this point.
So yes, those ones are computer generated.
My hope is that some of the ones that I don't get to watch are, uh, at least have
somebody wearing a bodysuit behind them.
All
right, all that and more on today's Mixture of Experts.
I am Tim Huang, and welcome to Mixture of Experts.
Each week, MoE brings together the sharpest team of researchers,
engineers, and product leaders that you'll find anywhere in podcasting
to discuss and debate the biggest news in artificial intelligence.
As always, we have a lot to talk about.
We've got, um, uh, the end of Stack Overflow.
We're gonna talk a little bit about, uh, a new project called llm-d, a new release
from Microsoft called NLWeb. But first I want to start by talking about Google
I/O. So Google I/O, if you don't know, is Google's annual developer conference.
Uh, it happened this week and there was a raft of announcements, uh, to be expected.
It was basically AI, AI, AI.
I dunno if you've seen the super cut of, uh, Sundar Pichai, the CEO of Google,
just saying AI on a previous I/O um, but this year's I/O is actually no exception.
Um, and, uh, perhaps one of the biggest announcements that I want to
get into first with the panelists.
Is, uh, the announcement of a $250 AI Ultra Plan, which now kind of
joins the Anthropic plan and the OpenAI plan in terms of these like
very highly priced subscriptions.
Um, and then with that, the launch of, uh, VEO 3 which is
their video generation model.
And, uh, a lot of people have been having a lot of chatter about it.
It's able to generate kind of text to video and then interestingly
can do, uh, audio now as well.
Um, and so I guess, uh, Abraham, maybe we'll start with you.
Um, you had a little bit of skepticism or you hoped, in the very least, you
know, that Hollywood stay would stay true to its roots and keep doing,
you know, movies with real people.
Um, curious about this though.
I mean, this technology seems really good and, and it feels like
if anything is in the cross hairs, it's basically movie production.
Yeah.
So in terms of the movie production aspect of it, yeah, I'm not super convinced.
Just a couple reasons.
One, in terms of the actual length of video you can generate, um, it
may be high quality, but in terms of actually being able to, you know,
cobble together an hour and a half movie via, you know, a very specific
prop, like I don't buy that quite yet.
Um, but I do think it'll be.
You know, really cool in terms of adding to special effects or being
able to enhance certain, you know, scenes or features within movies that
I think it can really play a part in.
Um, in terms of the, the price tag for the actual, you know, using their models.
Um, I, I think it's kind of interesting given, you know, you've got this, uh, you
know, open source community that's really starting to, you know, catch wind in terms
of a lot of the major developers kind of.
You know, loading their models into, you know, Apache 2 MIT licensing.
So, uh, I'm curious to see how that actually pans out.
Uh, I wish I had more information on how, uh, OpenAI was kind of, you
know, surfacing with their $200 per month subscription, but, um, you know,
I, I guess everybody's kind of, kind of calling for dollars right now.
It's a, it's a tough market to make money in right now in LLM development.
So, you know, it's, uh, it's kind of a winner take all.
Yeah, for sure.
And Marina, I had kind of an interesting question.
I was describing very excitedly to my partner last night.
I was like, oh, look at this new Veo 3 thing.
And, uh, she had this great response, which was like, what is this for?
What are you gonna use this for?
Why would you pay $200 for this per month or $250 for this one, for this per month?
Is there really a consumer angle here, or is this kind of just like a fun toy?
Like it feels like there's a mismatch between how much you would pay for,
you know, one of these features, which presumably I think is like one of the
features why you'd wanna buy their bundle and, and I guess what you'd even
just use it for on a day-to-day basis.
So putting.
My economist had on for a second.
I think there's something very nuanced to the fact that you can't
really get these things separately.
You have to get them in a bundle.
So some of them, these features immediately are useful right now,
and you could use them right now.
Others you might just be playing around with.
But again, to improve these things like Veo and whatever Google needs
data.
They need data.
They need data.
They need data.
They're gonna get it from you playing around with stuff while
actually making immediate use of the things that are more advanced.
So yes, to your point, it's a little bit hard to make money right now in LLM
space, but also the fact that they're bundling several technologies that
are a different degrees of readiness.
It's kind of clever.
I gotta say, because that is that exactly.
They're gonna, you're gonna bring people in with something that's already
working pretty well and then you're going to be able to that way get already
an ecosystem of people and that's, that's gonna work to their advantage.
Yeah. That bootstrapping is really interesting.
I think another dynamic, um, and Gabe, curious, you have any thoughts about
kind of like this pricing war that we're seeing now across all these
companies is, I guess in effect people are gonna have to choose one, right?
Like I think is where some of this is going.
If they want all the features and so this kind of market for people who are
willing to pay significant money on a month to month basis for these services
almost will end up being a little bit, kind of like you have to choose one.
Um, and, and yeah.
I dunno, curious about how you think about that.
Yeah, no, I mean, uh, to lean into what Marina said it, it's
a little bit like cable bundling wars and cable providers, right?
Like you've got different companies each competing to be the one with
the best bundle of capabilities, and you gotta have an anchor show, or
in this case feature that somehow differentiates itself from the pack.
And then, you know, you hang all the rest of your shows off of that and hopefully
a couple of them catch fire and, uh, you know, bring in some more eyeballs.
But it's, uh, it'll be really interesting to see where this
bundling thing, uh, sits.
And then when slash if the equivalent of, you know, the streaming revolution.
Uh, comes around and starts to decompose things into a la carte.
Um, whether that's may, maybe open source models and open source frameworks are the
equivalent of, you know, early streaming services that take apart the bundling.
Um, but it's all happening at the same time this time rather than,
you know, an entrenched industry.
So, uh, I think it's gonna be really interesting to see.
How those two sides of the coin shake out.
Well, one of those sides of the coin is, is bitterly
fighting with its peers, right?
So in the bundling market, in the paid for market, the peers are gonna be
fighting, uh, and then the open source is gonna be the alternative to the whole
paradigm, I guess.
Yeah, that's right.
And do you wanna go into that a little bit more?
I mean, I know you spend a lot of time thinking about open here.
Yeah.
Obviously, I guess in, in my title it kind of states my bias
here, but, um, yeah, that's right.
I mean, like ultimately you think that they're open's gonna win, but is this
sort of like the, it's the, you know, it's the exhaust event on the Death star,
like open source is gonna be the thing that really kind of...
No, I, you know, I think there, there are roles to be played for
um, put it together yourself.
And I think, you know, we've seen this in development
for a long time.
This is not new to AI.
Um, I think back to early Visual Studio days where you had to pay for expensive
subscriptions or at least expensive boxes of software with a CD inside to
install a good IDE on your computer.
Um, and then, you know, eventually things either caught up or surpassed
an open source and we found that the tools themselves were not
what people wanted to pay for.
So I think software development in general has always sort of been, um,
a game of searching for where there is value that is worth paying for.
And a lot of times the things that initially seem like extremely
high value propositions eventually migrate to a commodity that people
expect to be able to get for free.
Uh, and then the value moves somewhere else.
So I think in our space, uh, I'll, I'll win the mixture of experts game.
I think we're gonna see that shifting in the agent's direction.
Um, and uh, I think, I think we'll start to see, probably,
vendored proprietary agents that people are willing to pay for, that gets you
a higher entry point into the stack.
Uh, and then the model itself is gonna be a bit more commodity, but that's my
prognosis going forward.
Yeah, and I really do want to talk about, I mean that was the kind of second
set of things coming out of I/O that obviously announced a lot of things.
But the other big thing was, well, we now really want search to be more agentic.
You know, Gemini is gonna have agent mode that you can turn on.
And I guess Abraham, I'm curious about your response to that is like.
You know, maybe to put a finer point on is like, is is search the killer,
you know, agent, uh, capability?
Like, I think that's kind of what was being offered by Google here is to
say, look, we do search really well.
If we can do that in a more agentic way, then that's, that's
the killer in the agent space.
But I think you also see a lot of other companies kind of competing for this.
So kind of curious, uh, what you think.
Yeah.
Um, I mean, it's a great question.
I'm just kind of thinking about, you know, all the organizations that kind of
pass search out of either like native, as part of their agent capabilities,
you know, with, with, uh, ChatGPT and Perplexity AI, where it, it really
is not something that's net new.
Um, with respect to agent capabilities, I, I think it's kind of gonna be table
stakes where you, at the base minimum, you have to be able to support search
as part of your agent framework.
So it may not necessarily necessarily be something that is, you know,
uh, you know, bleeding edge.
Uh, but I think it's really like in order to be able to come to the table, um, and
you know, be a player as part of the agentic framework.
Search has to be one of the baseline, uh, you know,
capabilities that you can support.
And on top of that, you know, you can kind of decide what makes the
most sense for your user base.
But I think it really search is, is basically like a starting point
that if you can't support that at the bare minimum, then um,
you know, I, I, I think it, it, it just kind of raises some flags.
Mm-hmm.
Yeah,
for sure.
Um, Marina, I hate, I know you hate this kind of question, but
I'm gonna ask it to you anyways.
Is, um, is Google suddenly kind of catching up in this race?
Uh, I know the mood of the conversation, you know, as last year was like,
oh, they've fallen terribly behind.
Like, one of the things I love about AI is like anyone who is
up will be down in a few months.
Anyone who is down will be up in a few months.
Again, like kind of just like.
I dunno, Google is suddenly like back on the board again and, and
so I kept curious about like.
In the kind of battle for market share for this whole scope of tools, like
how, what should we take from io?
Is like I/O kind of a sign of strength from Google, or do you feel like this
still, they're still not quite getting it?
I think what's interesting with Google actually, it's how much
they're leaning into multimodality.
Like two thirds of their announcements are about.
Something that has to do with modalities besides text.
If you go and look like we touched on video, but look at all the stuff
that they're doing with Project Astra and with the fact that you're gonna
be able to search with what's on your camera and all these other things.
So, um, I think that Google's definitely in a better position this year than
they were last year, but again
compare public perception with what's probably actually going on under the hood.
They're, they're doing a pretty good job of saying, oh, don't worry guys.
We, we still have things going on, but if even after all the talking
we're doing right now, everybody's still using Google to search, whether
you're in AI mode or not AI mode.
And so they're continuing to get the data and get the data and, you know,
this is like my favorite topic ever.
So there's still gonna be, continue to be pretty ahead in.
Data on which to train any kind of new models.
So you're not just having people only interact with ChatGPT, you're
having people interact with Google.
Generally it's, it's a rich thing and I think there they
continue to have an advantage.
So piggybacking on that, uh, this is actually a great
segue to our next segment.
I wanna talk a little bit about Stack Overflow, which, uh, if as many of
you may know, is a much loved forum for technical questions and answers.
Founded in 2008, has become really like a pillar of.
Being a technical person, uh, online.
And the bad news is that the website is dying.
Traffic has been dropping and has been dropping in particular,
um, arguably because of AI.
Um, whereas in the past you would, uh, have to go to this website to
kind of look up the answer to your question on a, a coding issue.
Um, a lot of that's being replaced now by, um, auto complete right code generation.
So I wanna talk a little bit about like what that means and what it means
for traffic on the web as a whole.
So I guess maybe Marina to kind of like give you kind of a concrete question.
It's like, do you buy that sort of AI is killing Stack Overflow And if so,
you know, does it pose a danger to like even bigger places like, like Google?
So the Stack Overflow story does make me sad, um, because that is, it's, I think
we've all used it to a quite a decent amount of degree and it's really hard to,
um, not realize that you can't replace human expertise in these kind of more
nuanced questions with just what got.
Autogenerated, which means once again, regression towards the mean
regression towards homogenous.
Answers.
But what's really terrifying is that like, look, software always engineering moves
really, really, really, really quickly.
Things are out of date.
Immediately you need more people asking about what is the newest
thing, what is the latest thing?
If everyone's asking Cursor and Cursor fixes it and you say,
oh yeah, great, that was good.
Who's got all that fresh data?
Now only Cursor has it, or whoever it is that you're using.
So the market share ends up being really
important here.
So something that I would wish for people to do is to put AI to
good work and say, Hey, I accepted that answer you just gave me.
Why don't you make an anonymized version of my code and post it in the internet?
And now we keep stack overflow going so that other people can still have this
data, can still use this data, keep it, you know, really frictionless way
going as a repository of knowledge.
There's often more than one answer to these kind of questions, like you
really wanna continue to have these kind of barriers broke, broken down.
I understand.
Ease.
I understand access.
I understand.
It's right there.
And it's nice to you and it doesn't have the moderators telling you
this has been answered before.
Why are you dumb?
Sure.
But it's, uh, it's short-term gains for a real long-term loss.
Um, and so I really hope that we can not fall down that hole.
That's right.
And I don't know, I mean, I also like the angry mods.
Like I feel like that's a key part of the experience is to feel, feel
the burn of someone just being very angry about your question.
Gabe.
So I think, I mean, Marina's describing something, which I think is like
potentially really important, right?
Which is like, well, there are ways of architecting AI systems so they can
feed back into a human system, right?
But I think here, you know, I guess a cynic would say, well, someone like Cursor
has no interest in doing that at all.
Right.
And I guess I'm kind of curious if you feel like.
How we can change that, right?
Like, if we feel like this is a good approach, like what do we do?
It, it's interesting.
I, I had exactly the same sort of paradigm idea in my head when I read
about the demise of Stack Overflow.
Um, which is that I think right now, you know, I. I think we talked in the last
episode I was on about the, the state of Wikipedia transitioning to bot scraping.
Um, and I think, you know, even some of the ones we're gonna touch later on in the
episode are all about how does the content on the internet transition to an AI first
world where the primary consumer is AI.
And I think in this case, um, from a purely consumption
standpoint, yeah, it's great.
Like I don't need to spend time searching.
I can just get the code snippet directly into my editor.
Um, or worst case, uh, a concise version with you know, no need to scroll
through the comments to actually get, you know, a very clear representation
of, of what I want in a, in a chat context or something like that.
So from a user perspective, there's a clear win here, which
makes it in some ways a no-brainer that this is going to happen.
Like, I don't think we can stop it with that convenience, but I think
it's a one-way street right now.
Right.
Um, I was actually talking with someone, um, who used to run a food
blog and essentially her traffic is dead now because everyone just
asked ChatGPT for the recipes.
Um.
And I think we're seeing this sort of, the state is all the
data is going into the models.
I. To some degree, like we talked about with Wikipedia, the, the state
of affairs is still valid for recency and for rag type of use cases.
Um, but it's really changing the shape of who's gonna use the data.
And I think we'll probably see the shape of the data creation changing.
And what I would love to see is exactly what you propose, marina,
is that this kind of a collaborative effort where, um, when an AI usage
performs well then, that performed well
content then somehow, somewhere gets shared.
And I think, um, you know, there's a lot of, you said, you know, someone like
Cursor has no interest in that and I think they probably don't have interest
in necessarily, I. Um, sharing it publicly, but I bet that they might have
interest in trying to become the new Stack Overflow where they actually have the
ability to subscribe to shared answers.
And then if you subscribe to shared answers, you can get,
uh, you know, well thought out.
Results conversations had by other experts with their AI bots
accessible in your experience?
Right.
So there's probably a play to be made there for a vendored solution
and then hopefully a open solution that comes along and does a similar
thing, but with an actual, you know, fully open ecosystem that could look
something like an agent running
client side, you know, I know for example, the Continue team, uh, keeps all of their
code open source for the client side.
So you could imagine a plugin there that actually hosts this in some kind of a
neutral vendor, third party type of space.
Right?
Um, so I think, you know, just like we're seeing with vendor AI versus
open AI, open source AI, um, we'll probably see something similar hopefully
emerge around, you know, vendored AI.
Sharing AI content sharing versus open AI content sharing.
Yeah, no, I think that's great.
And I think one of the things is, I think the comparison to the
recipe website is so interesting.
'cause one thing I find particularly perverse about this is like how a bunch
of sites had to construct themselves in a particular way to be sustainable.
Right?
So like the classic thing with the recipe site is like, it's got the long narrative
thing and all of these ads and you know the reason you're doing this, you're
trying to make a living being a recipe blogger and like you need to increase time
on site and you wanna increase engagement and, but like, it's exactly that kind of
stuff that has made them very vulnerable
to say a chatbot where you just get the recipe and so there's kind of this
weird cycle of the economics where, you know, all the decisions you made
early on are now making, you know, your industry like kind of particularly
vulnerable to what's what's happening.
Exactly.
And I, I think there's gonna be a shift in
creating content that both appears well for AI, so the, the
SEO for AI type of experience.
And then also creating content that hopefully can be monetized through AI
so that the content creators, uh, the experts in whatever field it is, whether
it's experts on Stack overflow or.
Chefs, uh, can actually make a living here or actually have value
placed on their contribution.
Tim, that was my comment.
I don't know how many episodes ago.
Do you remember?
SEO It's gonna be deeply affected by all this.
That's right. The buy.
SEO, you're ahead of the game.
We go, um, Abraham, can I play like tech bro jerk for a bit, right?
Like, I feel like there's one argument, which is, look, these
models are getting so good at coding.
That in a few years, why do we even need Stack Overflow anymore?
Right?
Like we're, we're past the world of Stack Overflow.
Cogen is gonna just be able to happen in the future.
And so like, it's very sad, you know?
But I guess the tech pro view is like, isn't the technology making stuff
like Stack Overflow kind of obsolete?
Do you buy that?
Um, well, yes and no.
I think there's kind of two sides to it.
Like when the read the article that you shared, if you notice from the peak of.
Covid to, you know, the introduction of chatGPT, it was a pretty big drop.
And obviously GPT kind of accelerated the decline of overflow traction.
Um, more traffic.
But I think it was already a, like, you know, something that was in progress.
And kind of digging a little bit deeper, I saw that, you know, a lot of people
moving from Stack Overflow we're going to Discord channels to be able to
have conversations or WhatsApp groups.
WhatsApp groups.
So I think it really signaled that it wasn't necessarily,
um, the immediacy that.
ChatGPT could provide, but more so like a, an on, like a dialogue that
you can have in terms of navigating the problem as opposed to, you know,
posting a problem, having a solution at certain points in time being answered.
And then, you know, if you had another question to, to follow up on
it was, there was an issue was just, you know, being able to have time
to value in terms of using overflow.
So I, I think it's, it's less of a, you know, you know, LLMs are
going to, you know who, who cares?
You know, LLMs are gonna, you know, remove the need for anything like this.
And I think it's, how do we find ways to be able to have, you know.
Developers or software engineers have more natural engagement when
they're trying to navigate a problem?
Um, I think it's less about, you know, being able to code, um, again as, um,
I'm, I'm not an engineer so I say this lightly, but the, the being able to code
is being democratized relatively quickly.
I think it's actually having a, like, you know, understanding the strategy
behind what you're actually coding that I think is a lot more valuable right now.
And that takes a dialogue between yourself and whether it's an LLM or
another individual in your space.
And I think that's gonna be a really key, um, drive.
Either for whatever becomes the next, you know, catalyst or focal point for
how do we, you know, um, have a forum for, for these kind of conversations.
So, um, yeah.
So as from a tech bro perspective, I get it.
Yes.
It just makes it easier.
But then from an actual, you know, user perspective, I think it's more about I
want to be able to engage with somebody as I'm, you know, driving these projects.
Yeah, for sure.
Yeah.
And I think that there is something there around kind of like.
You know, again, like with all the jokes aside on Stack Overflow being kind of
occasionally sort of an unfriendly place, like actually, like part of like the idea
is that you're like kind of communicating with others and solving a problem.
And like that there may be some value that we are losing actually in that transition.
Um, I think is is sort of interesting.
I guess the future may be that you're like arguing with your, uh, Cogen tool
on a particular implementation, who knows?
Yeah.
Cogent was like, did you read
the documentation?
Geez.
I'm gonna move us on to our, uh, next topic.
Um, project launched, uh, with a collab number of collaborators
from a couple different companies.
Open source project called llm-d.
And, uh, I wanna start the segment just by reading the description of llm-d.
So LMD "is a Kubernetes native distributed inference serving stack,
a well-lit path for anyone to serve large language models at scale
with the fastest time to value and competitive performance per dollar
for most models across most hardware accelerators".
Gabe, what is this?
What does it do if you don't know anything about Kubernetes or you dunno
anything about distributed inference?
Like what, what, what is this?
Why should we care?
Yeah.
Okay.
Um, so going back to, uh, something I said earlier.
You know, I, I, I do believe that.
We're gonna approach a space where the models themselves are commoditized
and individual models have some strengths, uh, over other models.
Uh, but ultimately the, the model you choose is gonna have a lot to do with
the ecosystem you can choose it with.
Um, and if you are a model provider, the thing that makes
you attractive is your price tag.
Uh, and the thing that drives your price tag is the ratio
of performance to dollars.
Um, and so.
There's a really big divide between, you know, sort of the open source developer
tinkerers that wanna be able to load models on their laptop or connect to an
API service and run occasional queries.
I'm not worried about rate limiting and throughput 'cause I'm just
dabbling, I'm just messing around here.
And the people who are actually running the models, they really, really
care that all of the GPUs that they spent millions of dollars to buy are
actually getting used all the time.
Um, I thought one of the things that the, the technical.
Article about llm-d spelled out really well is that traffic for LLMs is very
different than the assumptions that a lot of the internet is based on, right?
So the internet is based on the assumption that most requests are
roughly the same shape and size.
Um, you know, sure you sometimes download big files, but by and
large you've got small requests with small sizes, and so you can just do.
Pretty naive, like spray 'em around to a bunch of horizontal replicas of
your website and your website's the same no matter which server you hit.
And, um, cool.
Great! Problem solved. Round-robin
load balancing for the wind.
For the win.
Um, but.
Uh, LMS are very, very different, right?
And we've talked about different use cases for LLMs, whether it's a huge
amount of context for rag type of, uh, scenarios or a huge amount of output
for, um, thinking type of scenarios.
Um, they all look and behave differently.
And, um, another hugely important part is prefix caching, right?
So, to get briefly technical here, you know, all of these models are auto
aggressive, which means they compute up to a certain point in the token sequence
and then to compute the next one.
And the math for the next one is based on all of the
math that was computed for the previous one.
Um, and this is great if you have one instance of your server running right?
Uh, because you've already got all that math pre-computed, it's
just sitting there in memory.
You can just do whatever the little delta is to get the next token.
But if you happen to somehow land on a different server that has not pre-computed
all of that, you gotta go back to the beginning and start over again.
And that's a really wasteful operation.
So the thing that the llm-d team has really, really focused on is the
routing of requests to make sure you're.
Maximally taking advantage of all of the stuff that's already in
memory, those common prefixes.
Um, and then also maximally taking advantage of, you know, saturating those
GPUs based on the shape, uh, and the expected output length of the requests.
Um, which I think.
You know, is really technical and nitty gritty in the details.
But what it's ultimately gonna do is mean that for providers of LLMs, and this is
not just hyperscalers to be clear, right?
Hyperscalers are gonna build their own stacks.
This is targeted at enterprises that have constrained environments
where they wanna approve and manage and run their own models.
Um, this is gonna give those enterprises the ability to actually.
Run models that fit their business needs, uh, at a cost that is
actually approachable to, uh, you know, adopt AI inside their space.
Right.
So that's the real target of a tool like this.
Yeah, for sure.
And Marina, I guess one question that's I think worth asking is, you know, like
Gabe was saying, right, the hyperscalers are gonna do this in-house, um, and like.
But like what's being described here, I mean this is a lot of work, right?
To like get all of the routing to be optimized to maximize GPU usage
and the end result is that you save people a lot of money, right?
Ultimately, um, why is this getting released in an open way, right?
I think it's like another set of questions like what's the open source play here?
'cause this would seem like the kind of thing that you'd want to
keep in-house secret proprietary.
I mean, the scale at which they would like this to work, you can't keep it in house.
This is almost like having to re-figure out how you want to do work in the
way that when we realized how to handle databases more efficiently,
you wanna have a lot of basic ways of how do you represent data, how do you
handle transactions, how do you handle collisions, and things of that nature.
There was not just one company that was like, Nope, we're going to own
databases, you can't do it.
Not if you actually truly want something that is that widely adopted.
So in this case, I'm gonna go hit back to what Gabe said before about
agents, you know, being a new thing.
I don't know what agents means either, and nobody knows what agents means,
but it does mean complexity and it does mean lots of things, having to do lots
of actions and take lots of choices.
So now we're really getting back into database transactions, but for the
gen AI world, and so this is really interesting and important work.
To almost have a new set of standards across the board for everyone,
no matter what particular agentic flow you are or are not using
for your own particular use case.
So, um, yeah, that's my perspective on it.
To build on Marina's point, the complexity of llm-d, you know, open
sourcing, it really leads to a, a market, like an opportunity for Red
Hat to be able to provide support.
So when you talk about from a commercial enterprise, how do you
actually make money off of this?
It's right in Red Hat's wheelhouse where, you know, you
provide a very, you know, you.
An open source technology gets widely adopted and your commercial
strategy is really to be able to provide support for it.
So I think given the, you know, the consortium of organizations that
are on board from a commercial to, you know, an educational standpoint,
I think that's really the play here in terms of, okay, well, I.
Why do you wanna open source this?
Well, because the complexity of it and the adoption of it will really
drive support back to Red Hat.
Yeah, a hundred percent.
And I think, you know, the, again, this business model is also not new.
Uh, I mean, how many technologies out there do people have as the cornerstone
of their business that the, you know, play around with it, version
of it exists on your laptop and the scale it up to production usage is.
Complex enough that you either need to just hire somebody that has already done
it and use it as a service, or you need to hire somebody to help you do it yourself.
I think that's exactly the same thing here, right?
I mean, it's the reason Kubernetes itself has traction, Kubernetes is open source.
Um, but there's not a lot of people probably out there.
Well, certainly home laborers might be running their own Kubernetes cluster,
but when it comes time to running your business, you're generally gonna,
you know, buy a cluster from, uh, somebody or, uh, if you are, you know, a privacy
sensitive or otherwise constrained industry, you're gonna run it on-prem
with either a big team in-house or with support from a company like Red Hat.
Um, that, that does this for a living.
Yeah.
One of the things I love about this is, uh, and I guess it's true of the AI
space in general, is like, you know, the technology itself is like so weird and so
cutting edge and like, you know, people are like, we're building a machine, God.
And then you're like, actually, but the business model's B2B SaaS
or like, actually, actually the business model is like Red Hat.
You know?
It's just like, it turns out like the way we monetize and build
businesses around these technologies is like in some ways the same game
we've always played, you know?
Which I think is like very interesting.
I'm gonna move us on to our last topic.
Um, Microsoft, uh, did a fun little release that I do want to kind of talk
about before we wrap up the episode.
It's a project called NL Web.
Um, and it's an open project that, and again, I'm doing a lot of quoting this
episode, but it's, there's good quotes.
So quote, turn your website into an AI app, allowing users to query the contents
of the site by directly using natural language, just like an AI assistant or.
Copilot.
Um, and it ally comes with a little thing where you can set up your website
as a a model context protocol server so that agents can interact with your site.
And I guess maybe Marina, I'll kick it back to you.
Um, this is kind of a fun project 'cause it envisions a version of the
internet where like everything is just.
Talking like you're just talking to every single website.
And indeed, websites can talk to one another in natural language.
Um, and it seems to be in some ways a bet that like, actually these conversational
interfaces become ubiquitous in a way that is almost like a little bit funny.
It's just like, oh yeah, I'm gonna go have a conversation with Yelp
and then I'm gonna go over there and have a conversation with,
you know, Twitter or whatever.
Um.
Are we about to see conversational interfaces take over?
I know we've debated it a little bit about like just how dominant that paradigm is
gonna become for interacting with ai.
I'm curious if you kind of buy that as a vision for where the web is going.
I am gonna also quote something from what they wrote, which is every
instance is also in MCP server.
Allowing websites to make their content discoverable and accessible to agents and
other participants in the MCP ecosystem.
And that was the thing that I caught onto, not the, you're talking to your website.
It's your website is now something that is going to be discoverable by agents
and agentic flows that are gonna have one particular protocol to get stuff
and information from your website.
And that is the data.
The data, data.
Hello, once again data and a standard in how to handshake.
Is where I sort of zoned in on, um, I believe it was, uh, sort of
intended more in that direction.
So like is it fun to have that kind of a website?
Yeah.
But I think the deeper thing is not that we wanna have a conversational
interface with everything.
It's, we want this to be a way that the applications that you build on
top that again, to, to search and to scrape and to take actions and whatever.
There's a, a unified protocol that, that was my perspective on this.
That's right.
And I think it's the direction we could take it, I mean.
Like MCP, right?
Could be the new like thing that just becomes standard built
into every resource on the web.
Um, you know, I guess like curious how you size that up, like we, we may be
headed in that direction is that everybody eventually wants to make themselves
very indexable, if you will, to agents.
Um, so, uh, full transparency.
I, I may not be the best person to ask a question, but just kind of after a few
conversations at MCP with some people here at in research, um, you know, there's
a couple different protocols out there.
If this is a agent play, I mean, I think there's room for, you know, agent to agent
or IBM's agent protocol to be able to step in there and, you know, play a factor.
Um.
I, I, I might have to take a step back here and just better understand, you
know, from Gabe or one of the team members, you know, do, do you foresee
this as being actually the defacto protocol for, uh, for agents or for
models being used across websites?
Yeah, and I think to maybe do a twist on that for Gabe, it's like we were talking
a little bit about like the economics of where attention flows and all that.
It kind of feels like.
If that people don't get that right.
I dunno.
I I might just say I'm not, I'm not making my site easily interactable with agents.
Like, I don't want them touching my stuff, you know?
Yeah. Like, I don't want them getting my
data, you know?
That, that, that's, that's a great, yeah.
I think, I think you might see one of two things, like, you know, like, uh,
the equivalent of MCP's robots.txt like, don't, don't use this for, for agents.
Um, no, but I think, I mean, I think that's exactly right.
I think this is a attempt
to put a stake in the ground about standardizing the
transition to an AI first web.
Right.
I think, um, saying that, Hey, this is a better user experience.
You get a little chat window and now you don't have to go search around through
navigations, like that's genuinely useful.
Also, at the same time, you're exposing the same index ability to agents, um, and.
It's basically back to the same problem we talked about with Stack Overflow.
It makes these things eminently consumable by ai.
The question is, is there any way to give back or to sort of
make it a two directional street?
I think, um, you know, and Abe, to your point about the multiple
different protocols out there, I actually see this usage as exactly the
right usage for MCP relative to the other protocols that are out there.
Um, so I think.
Because this is unidirectional, this is like I'm, it's, it's
like an HTP server, right?
You know, you're gonna see MCP colon slash slash instead of
HTTPs colon slash slash right?
This is going to be like, go grab uh, my information from my site
for the sake of an agent rather than for the sake of a web browser.
And I think, I think that's actually, to me, that's a very sensible
technology direction for exposing content to something, whether
that's something as a user or.
A user interface or a agent.
But I do think, um, when you start getting.
Turning, the notion of turning a site into an AI app is more nuanced
and interesting to me because that speaks to not just being a content
provider, but also being an interactor.
Uh, and when you start having that site, then also want to go take
action against, say, other sites.
Now you start getting into an agent to agent or an agent communication protocol
or something that actually has to allow the, the interaction that came to me.
Now I'm gonna go choose to take some other action.
It's that.
It's that, um, having agency, shall we say, uh, that will make, make the
site in fact actually take action and not just sort of respond to queries.
I think that's the delta in my head between the different protocol groupings
and MCP makes a lot of sense as sort of a, a first step to turn a website
from a pile of HTML into something that can be directly exposed to
a consumer that has AI on the other end a model?
Yeah, definitely.
I mean, presage is like, I think a really interesting world where.
There's like the human web that we all interact with, and then
there'll be this like agent web that you kind of maybe never really see
that's going on below the surface.
And that like you actually may have sites that are like not
even human readable, right?
They're just like agent protocol.
Yep, absolutely.
Or the other way around,
right? Like I think there's kind of like, and.
And to your, to your question at the top of this, your twist on the
top of this, I think it's gonna be an economics question, right?
Like at some point, um, we will either see that there is a reasonable business
model where folks creating that content can get some return on their investment.
Um.
Or we won't, at which point, you know, either people will explicitly not
expose MCP servers or add obfuscation to the source code of their website
so that it is terrible for crawling.
Right.
Like a whole bunch of gorpy extra tags in there that now the
models have to learn to ignore.
Yeah. I mean, the rise of
like anti agent technology is also something we're about to see, right?
Which is like, just get your agent lost.
This maze that it never gets out.
Exactly. Exactly.
Go, go
send them down some malformed XML and good luck.
Right.
So it'll be really interesting to see, you know, and as a, a collaborator by
nature, I'd love to see something where there actually is a good incentive for,
um, folks to make content available to AI and get some return on that.
And I think that would make for a great user experience downstream.
But as we know, you know, this is gonna be the wild west of the internet.
Like there will be combativeness and you know, people looking
for the best advantage.
So, uh, it'll be really interesting to see where that swings.
Yeah, for sure.
So Marina, maybe a last thought from you.
Um, you know, I always think a little bit about how like HTTP and
like open protocols in general are this kind of weird miracle that
everybody just agrees to kind of like be interoperable with one another.
Um, are you bullish on MCP becoming like a uniform standard across the board?
Because I mean, what Gabe was saying, you could imagine a world
where you say, look, it's gonna be the, the Tim Context protocol.
And actually only sites that correspond to the Tim Context protocol will have
agents that'll talk to one another.
Um, and so it feels like there's a lot of incentives to kind of like break
away and create these kind of like more walled garden type experiences.
Um.
I don't know.
Do you think kind of open winds here or is it anyone's guess?
I mean, to some extent hard to predict the future, and I agree with how
Gabe had described MCP as a protocol.
It's not the only protocol out there, right?
We've got FTP, we've got all these protocols for sharing different
information in different ways, and very often it really does start because
a small enough group of people that are really deeply in the middle of
it, get real annoyed and say, guys, we're just gonna agree on something.
And because that is a result, they can actually get somewhere.
Everybody else says, yeah, okay, great.
We're, we're gonna adopt this protocol as well.
Um, we need these protocols because otherwise we, you cannot continue to grow.
And especially in the world of AI and agents, scale is more important than ever.
HTTP, HTTPS all right now is just like serving a website, MCP.
I don't know that it's gonna be all that useful unless you
have enough people adopted.
So you really are gonna have to have, uh, ways to drive adoption.
I agree with Gabe that everything's an economics question.
At the end of the day, are you getting value from it and are
you gonna contribute or not?
There's a lot of security questions here.
Again of our, if you're gonna go and talk to a different website, are you
gonna come back with like poison pill actions to take on your own website?
This all gets a little terrifying pretty quickly.
Um, but I think that yes, it is going to, uh, either this or version of this
is gonna have to be adopted because you can't have proper scaling in the world
without the infrastructure, the tubes.
Underneath it.
So yeah, we're, we're gonna have some version of this.
Absolutely.
Well, and I think the flip side of that too is if you look at browser
technology, browsers are really good at handling really bad output, right?
Yeah. They have to be.
Um, it's not like, yeah, we we're finally at a, a, a space where
there probably is some, you know.
Coalescence around, you know, the actual source code of webpage that get served up
to the browser so they can render it, but like man handling malformed xml, you know?
Yeah. It took some time.
PHB, JavaScript, every other piece of web technology that's ever been thought
of as a way to revolutionize, you know, what a server is hosting that gets
eventually rendered in your browser.
So it's a, it's a two-sided coin, like one, there's some coalescence on the
creator side, and then two, there's gotta be really good technology that is
very good at handling edge cases and failure cases on the consumer side.
So I think, we'll, you know, whether we see MCP as the one and only protocol to
win them all, or whether we see a handful of them that persist and eventually
shake out, or whether we just see a bunch of like mediocre implementations
of different protocols that all, you know, like drop the trailing curly
brace on their JSON just for fun.
Now, maybe the most likely scenario, you know, uh, well, you know, if, if
the token doesn't get generated for the last curly brace, so that's right.
Be it. Right?
So, um, I think.
It's gonna be a push and pull and ultimately it all comes down
to people or you know, people in conjunction with AI writing software
that can make this stuff usable.
Right.
And so I think, I think we'll see some emergence of coalescence
on the consumption side as well.
Probably.
Um, I. So it'll all be driven by, uh, people tolerating, you know,
gorpy experiences for only so long and eventually they'll just watch
and then doing something about it.
Yeah. Everything
is driven by engineers getting real annoyed and going
and writing some software.
Yeah.
Well, that's all the time we, we have for today.
Uh, Gabe Abraham Marina, thanks for coming on the show.
Always great to have you.
Uh, and, uh, Abraham, Gabe, you should come by more often.
It's good to see you.
Um, and finally, uh, thanks for joining us.
Listeners, if you enjoyed what you heard, you can get us on Apple
Podcasts, Spotify, and podcast platforms everywhere, and we will see you all
next week on mixture of Experts.