LlamaCon Unveils Developer‑Friendly Llama API
Key Points
- The panel reflects on AI hype that didn’t pan out, noting that technologies like Kolmogorov‑Arnold Networks and certain “pin” innovations have proven less impactful than expected.
- Experts highlight a sharp decline in “intelligence per dollar,” indicating that the cost efficiency of AI has worsened despite broader hype.
- A call‑to‑action from J.P. Morgan and a surge of activity in China’s AI market signal new strategic pressures and opportunities for AI governance and investment.
- Meta’s first LlamaCon introduced the Llama API, a unified developer platform that combines closed‑source power with open‑source flexibility, providing centralized fine‑tuning, evaluation, and hosting to simplify enterprise use of Llama models.
Sections
- MoE One-Year Anniversary Recap - The podcast celebrates its first year by reuniting original guests to dissect overhyped AI trends, discuss new developments such as LlamaCon and Chinese AI market activity, and reflect on the evolving AI landscape.
- Meta's Open-Source Llama Hub Debate - The speakers discuss whether Meta's release of open‑source Llama models and a centralized hub indicates strategic strength by cultivating an ecosystem or a defensive maneuver, highlighting the value of a standardized stack and the growing importance of fine‑tuning personalized models.
- Tiny Prompt Card Advances Guardrails - Speaker notes a new 22‑million‑parameter prompt‑card model, references the recent GuardBench benchmark where Granite Guardian tops the leaderboard, and emphasizes how such lightweight models could boost AI safety through layered guardrails.
- Upcoming Llama Model Landscape - The speaker outlines two forthcoming Llama models—a compact 8‑billion‑parameter version and a massive, yet‑to‑be‑practical behemoth—while noting challenges in model distillation, multi‑agent orchestration, and expressing optimism about open‑source collaborations with partners like IBM and Box.
- Deliberate vs Instant Answer Mode - The speaker argues that AI should toggle a “thinking” phase—suppressing internal deliberation for simple factual look‑ups while activating it for logical or math problems—to provide appropriate, timely responses.
- Selective Brain Activation, AI Specialization - The speaker likens the myth of using only part of the brain to AI systems that activate only relevant components, arguing that modular expert mixtures can reduce computational load and outperform monolithic models, thereby reshaping common assumptions about AI competition.
- Sovereign AI and Edge Deployment - The speaker highlights the drive for nations to build independent AI supply chains, the rise of efficient open‑source mixture‑of‑experts models like Qwen‑3 that rival major providers, and the push to run these models at the edge on IBM Z hardware under Apache 2.0 licenses.
- Future GPU Surplus & Global AI - The speaker worries about excess GPUs from waning demand, proposes repurposing them, and highlights worldwide AI breakthroughs such as Korea’s open‑source speech‑to‑text model, emphasizing a shift beyond US‑China competition.
- Governance and Security in SaaS Deployments - The speaker stresses the importance of robust security governance when scaling SaaS solutions, especially for regulated industries, noting the rush to embed agents, evolving standards, and the shift from experimental to production environments.
- Governance vs Agent Proliferation - The speaker contends that scaling AI—particularly auditable, smaller speech models—requires rigorous governance in 2025, warning that relying solely on creating more agents to counteract risks is an insufficient solution.
- Anniversary Episode: Reflections and Shout‑outs - The hosts celebrate MoE’s first‑year milestone, recap the debut show, commend the Robust Intelligence team’s Cisco model, resolve a debate, and preview nostalgic clips from the past season.
- Futuristic Wearable Market Speculation - The hosts debate the untapped value of an R1 device tucked away in a garage, compare it to Ray‑Ban glasses and smartwatches as phone‑augmenting accessories, argue the market isn’t ready for true wearables, and briefly celebrate a point about a pager before noting a mysterious chatbot.
- First “Agents” Mention Competition - Participants joke about a leaderboard to determine who first used the term “agents” on MoE, linking the word’s debut to early discussions of agentic flows and tool‑augmented GPT models.
Full Transcript
# LlamaCon Unveils Developer‑Friendly Llama API **Source:** [https://www.youtube.com/watch?v=yeutlpU13YM](https://www.youtube.com/watch?v=yeutlpU13YM) **Duration:** 00:39:38 ## Summary - The panel reflects on AI hype that didn’t pan out, noting that technologies like Kolmogorov‑Arnold Networks and certain “pin” innovations have proven less impactful than expected. - Experts highlight a sharp decline in “intelligence per dollar,” indicating that the cost efficiency of AI has worsened despite broader hype. - A call‑to‑action from J.P. Morgan and a surge of activity in China’s AI market signal new strategic pressures and opportunities for AI governance and investment. - Meta’s first LlamaCon introduced the Llama API, a unified developer platform that combines closed‑source power with open‑source flexibility, providing centralized fine‑tuning, evaluation, and hosting to simplify enterprise use of Llama models. ## Sections - [00:00:00](https://www.youtube.com/watch?v=yeutlpU13YM&t=0s) **MoE One-Year Anniversary Recap** - The podcast celebrates its first year by reuniting original guests to dissect overhyped AI trends, discuss new developments such as LlamaCon and Chinese AI market activity, and reflect on the evolving AI landscape. - [00:03:03](https://www.youtube.com/watch?v=yeutlpU13YM&t=183s) **Meta's Open-Source Llama Hub Debate** - The speakers discuss whether Meta's release of open‑source Llama models and a centralized hub indicates strategic strength by cultivating an ecosystem or a defensive maneuver, highlighting the value of a standardized stack and the growing importance of fine‑tuning personalized models. - [00:06:06](https://www.youtube.com/watch?v=yeutlpU13YM&t=366s) **Tiny Prompt Card Advances Guardrails** - Speaker notes a new 22‑million‑parameter prompt‑card model, references the recent GuardBench benchmark where Granite Guardian tops the leaderboard, and emphasizes how such lightweight models could boost AI safety through layered guardrails. - [00:09:12](https://www.youtube.com/watch?v=yeutlpU13YM&t=552s) **Upcoming Llama Model Landscape** - The speaker outlines two forthcoming Llama models—a compact 8‑billion‑parameter version and a massive, yet‑to‑be‑practical behemoth—while noting challenges in model distillation, multi‑agent orchestration, and expressing optimism about open‑source collaborations with partners like IBM and Box. - [00:12:16](https://www.youtube.com/watch?v=yeutlpU13YM&t=736s) **Deliberate vs Instant Answer Mode** - The speaker argues that AI should toggle a “thinking” phase—suppressing internal deliberation for simple factual look‑ups while activating it for logical or math problems—to provide appropriate, timely responses. - [00:15:24](https://www.youtube.com/watch?v=yeutlpU13YM&t=924s) **Selective Brain Activation, AI Specialization** - The speaker likens the myth of using only part of the brain to AI systems that activate only relevant components, arguing that modular expert mixtures can reduce computational load and outperform monolithic models, thereby reshaping common assumptions about AI competition. - [00:18:40](https://www.youtube.com/watch?v=yeutlpU13YM&t=1120s) **Sovereign AI and Edge Deployment** - The speaker highlights the drive for nations to build independent AI supply chains, the rise of efficient open‑source mixture‑of‑experts models like Qwen‑3 that rival major providers, and the push to run these models at the edge on IBM Z hardware under Apache 2.0 licenses. - [00:21:42](https://www.youtube.com/watch?v=yeutlpU13YM&t=1302s) **Future GPU Surplus & Global AI** - The speaker worries about excess GPUs from waning demand, proposes repurposing them, and highlights worldwide AI breakthroughs such as Korea’s open‑source speech‑to‑text model, emphasizing a shift beyond US‑China competition. - [00:24:48](https://www.youtube.com/watch?v=yeutlpU13YM&t=1488s) **Governance and Security in SaaS Deployments** - The speaker stresses the importance of robust security governance when scaling SaaS solutions, especially for regulated industries, noting the rush to embed agents, evolving standards, and the shift from experimental to production environments. - [00:27:56](https://www.youtube.com/watch?v=yeutlpU13YM&t=1676s) **Governance vs Agent Proliferation** - The speaker contends that scaling AI—particularly auditable, smaller speech models—requires rigorous governance in 2025, warning that relying solely on creating more agents to counteract risks is an insufficient solution. - [00:31:01](https://www.youtube.com/watch?v=yeutlpU13YM&t=1861s) **Anniversary Episode: Reflections and Shout‑outs** - The hosts celebrate MoE’s first‑year milestone, recap the debut show, commend the Robust Intelligence team’s Cisco model, resolve a debate, and preview nostalgic clips from the past season. - [00:34:04](https://www.youtube.com/watch?v=yeutlpU13YM&t=2044s) **Futuristic Wearable Market Speculation** - The hosts debate the untapped value of an R1 device tucked away in a garage, compare it to Ray‑Ban glasses and smartwatches as phone‑augmenting accessories, argue the market isn’t ready for true wearables, and briefly celebrate a point about a pager before noting a mysterious chatbot. - [00:37:06](https://www.youtube.com/watch?v=yeutlpU13YM&t=2226s) **First “Agents” Mention Competition** - Participants joke about a leaderboard to determine who first used the term “agents” on MoE, linking the word’s debut to early discussions of agentic flows and tool‑augmented GPT models. ## Full Transcript
I wanna go back one year, it's May, 2024 again.
What's the biggest thing in AI that turns out to be not that big of a deal?
Kush Varshney is an IBM fellow, uh, on AI governance.
Kush, welcome back to the show.
Uh, what do you think?
Uh, Kolmogorov-Arnold Networks.
Got it.
That's a good one.
Shobhit Varshney, Head of Data and AI for the Americas.
Shobhit.
The cost of AI, I think the intelligence per dollar has plummeted significantly.
Absolutely.
And last but not least is Chris Hay, Distinguished Engineer and
CTO of Customer Transformation chris, what do you think those
stupid pin things we got all excited about last year.
All that and more on today's Mixture of Experts.
I'm Tim Hwang and welcome to Mixture of Experts.
Each week, MoE brings together the smartest and I think the most good
looking crew in all of podcasting to discuss and debate the biggest
news in artificial intelligence.
And this is a big episode.
Today we're officially celebrating our one year anniversary of MoE.
We brought together the original crew from MoE episode one to join us. All-star Crew.
We're gonna do a look back, uh, cover a call to action from J.P. Morgan, a new
wave of action in the Chinese AI market.
But first, I really wanted to cover all the latest from LlamaCon.
So I believe this was the first event, uh, officially the first LlamaCon
that Meta has run, focusing on its work in the open source space and
around the Llama class of models.
Um, I think a lot of announcements to cover here, but I think Shobhit the
first one that I was really intrigued to get your take on was they announced
this thing called Llama API, and it's a developer platform that quote will
bring together the best of closed source with open source flexibility.
Um, and so for our listeners who might be less familiar with
this, like what have they done and why is it kind of a big deal?
I always, in my opinion, I think it's kind of a big deal.
Yeah.
So today, uh, in the current state, if uh, an enterprise needs to go play around
with Llama models, you go to one of your hyperscaler partners and say you're
gonna use their version of the studio.
Their way of fine tuning it and whatever the hyperscalers are, are producing.
And then once you're done with that model, it's difficult to
move it around and whatnot.
Right?
So in this particular case, Meta is coming out and saying we want to be
as developer friendly as possible.
We'll give you a central place with all the playgrounds, the
fine tuning capabilities, also evaluations, and so on, so forth.
So as you're fine tuning the model, you can test it out.
All of that will be done centrally.
They will host the API for Llama as well.
You can obviously still get it everywhere else that you get your, uh, your LLMS
from, but now they're developing a whole set of stack, so they're moving
beyond just providing the model to be providing the whole ecosystem.
They have done enough work in the space with Llama Stack and a few other things
in the past, but this was their coming out party saying that we are gonna
be as developer friendly as possible.
Come work with us, we'll help you fine tune it.
Once you're done with that model, you can take it anywhere.
Obviously there are a lot of things around privacy where they will not train
the model, uh, on the data that you're providing them and so on and so forth.
But the inference speed is, is amazing with their partnerships
with service and GR and others.
So overall, they wanna be the hub where people come and experiment
with Llama models versus Llama models being one of the 200 models available
on Microsoft or AWS or Google.
For sure.
And Chris, maybe to bring you into this conversation, I was having a debate
with a friend about this announcement and we're kind of talking about like
whether or not this is almost like a position of strength for Meta or almost
like a position of weakness for Meta.
There's one point of view which is, hey, we release these open
source models and everybody will build all the tooling around it.
Essentially that's like kind of what we do is we do the model and then like
everybody else builds the ecosystem.
So that's kind of like the, the, the bear case.
And uh, my friend was like, well the bull case is actually that.
Like they recognize that like they're actually investing more in this space now.
And it's like really they recognize that there's such a big opportunity that
they have to actively build this stack.
I'm curious if you kind of have any feelings about that or how
you kind of size up these moves.
I think it's a really interesting move.
I mean, as you kind of say, I think it's a great move.
I think having a sort of standardized stack where you can bring your
models, you can fine tune them, and, and I think fine tuning is gonna
become a bigger thing in the future.
So, you know, because you're gonna want your own personalized model, you're gonna
want something with domain knowledge and therefore bringing that into a
consistent place, I think is a good thing.
And then if you think about where Meta wants to go in the future, they
want AI to power all your avatars assistance, et cetera, uh, on their
platforms and, and have agents on there.
Um.
Then I think making it easier to have a playground for developers and individuals
to, to tune models based on Llamas stack, I think is a sensible thing.
I, I do think though, that when I really look at this though, um, all the APIs
are OpenAI, compatible APIs, and nearly every single service provider is moving
towards OpenAI compatible APIs anyway, so I, there is still a part of me that
goes, well, can I do that somewhere else?
And, and, and sure with the fine tuning part specifically, that is hard, right?
Because, you know, getting your models out of some of those
existing stacks and taking 'em elsewhere is a more difficult thing.
So I think, I think that is a differential play
in my mind.
Totally.
Yeah.
It's getting more complicated, kind of seeing them navigate this.
Christian, another part of the announcement that I wanted you
to comment on was that they also announced all of these kind of
security and protection models.
Um, so Llama Guard, four Llama firewall, Llama, Prompt Guard 2.
It kind of feels like a little bit of like, almost like the protection space
around AI starting to get like a lot more complicated than it used to be.
Where the old thing was like, oh, well we just have a model that tells you
if like the outputs are toxic now.
It feels like they've got like at every layer of the stack, there's like a model
you can use for security and safety.
Um, curious about how you kind of like read these trends.
Like where is this going?
Is it just gonna become like a more and more complicated, you
know, ecosystem of safety models?
Um, yeah, just curious about your hot take on that.
Yeah.
Um, so yeah, as you said, I mean, they, uh, have this new, uh, Llama Guard 4,
uh, it's uh, 12 billion parameter model.
Um, it's multimodal, so it has the vision and the text in there.
Um.
Uh, the, uh, the prompt card they made really tiny, I think, uh, 22,
uh, million parameters and stuff.
So, yeah, I mean, they're making progress.
Certainly.
Um, uh, the headlines are good.
Um, uh, uh, we haven't had a chance to, to evaluate and see,
uh, what the performance is yet.
And, um, yeah, actually just, uh, a week and a half ago, um, uh, maybe
two weeks ago, there was a new, uh.
A benchmark that came out, uh, called Guard Bench.
Uh, so this Guard Bench, um, actually goes and tests, um, a lot of, uh, of
different Guardrail models and and stuff.
Um, uh, just a a side note, uh, the Granite Guardian model that I've
talked about in the past is, uh, at the top of that leaderboard,
but, um, uh, we should see, I mean, how is, uh, how is the big deal?
Yeah, exactly.
The how's Llama Guard for, uh, uh, doing there because, uh, if they've really made
good progress, that's, uh, that's awesome.
Um.
And, uh, I mean, the fact that the, uh, the prompt card is so tiny, I
think, uh, that, because that's gonna make a, make a huge difference because
it's like 22 million parameters.
It's like a blink of the eye.
I mean, it's like, uh, uh, you can do it so fast.
So, I mean, I think the overall space is, uh, just.
Becoming where people are realizing the seriousness of safety and security.
So, uh, just having everything there.
I mean, multiple layers of security.
I mean, that's, uh, just good practice.
Uh, so having it, uh, on the inputs, on the outputs, um, the overall firewall.
I mean, all of that is, uh, is good stuff.
And then we'll see, uh, I mean, how it goes, uh, how it progresses and, um, uh.
I mean, no, uh, no concern for me.
I think this is where, where the field needs to go.
Um, Shobhit, before we move on to our next topic, any other, uh, kind of
announcements that you'd highlight?
I know there's a bunch announced.
Those are the kind of two that stood out to me, but I know there's,
like, I saw the whole blog post.
There's like a lot going on.
Yeah. Yeah.
Um, the other couple things were one was around their Meta AI app.
They have consolidated all of their intelligence into one app, and that
could be a ChatGPT competitor or Gemini and so on and forth, but they want one
app that people can go and do, do some cool things and, and, and talk to it.
And they have the potential to make this super hyper-personalized because they
have billions of, of, uh, interactions happening across all of their.
WhatsApp and Instagram and um, and Facebook.
You could potentially have a avatar that is really personalized to
your particular needs and wants and things that you care about, right?
It is a delicate balance between privacy and hyper-personalization.
They'll have to do that balance re uh, delicately.
But they, they have a huge bet on creating the one app where you go to
for all of your, uh, all of your AI.
There are a few other things that may have been, uh, like brushed off in
the, in the details, but they have done, they've had about 1.2 billion
downloads of Llama models and most, and a lot of those, like majority
of those are derivatives of Llama.
On Hugging Face and other places, right?
So clearly the momentum around open source with the developer community
is amazing and Llama has had a huge impact on where we are today
with open models versus others.
But there were a few things where I was, was still on my wishlist that, that we
couldn't quite, that they didn't get to.
Uh, there are two other models that they had announced.
They're not coming quite yet.
One is their small little Llama model.
That'll be about an 8 billion parameter model.
8 billion was the, was the most popular size of the Llama model from
the last, uh, previous generation.
We have not announced, seen that yet, but that would be a game changer for
our enterprises, especially if you have good methods of distilling it down.
And then on the other end of the spectrum is the behemoth model.
They still need to figure out what they do with it.
It's not something that's practical at, at this size to be run by enterprises, but
we need to figure out what's the right way of displaying it down or can I use that to
train or other models and so on and forth.
There are other things around, uh, multi-agent orchestration that I was
expecting Llama to, to release as well.
Uh, like things like MCP support and agent to agent, uh, protocols
or anything around agent ops as part of the whole Llama stack.
I'm waiting for them to announce more things in that space as well.
But overall, really positive.
Uh, it's good to see that we are celebrating open source, getting
closer and closer to, uh, to the frontier models as well.
So great LlamaCon for all of us.
We had a good partnership with them.
IBM and Box have done some amazing work with Llama was
was announced on stage as well.
So overall, very positive for all
of us. The community has enjoyed it.
That's great.
Yeah. And a lot more to come.
I'm sure what you talked about is like gonna be coming out like in the
next, very soon I think probably.
This is great.
And I, speaking of open, I think I'll move us onto to our next topic.
Uh, we wanted to do a kind of short segment because there's been a
lot of kind of interesting things bubbling up, particularly in open
source, um, in the Chinese market.
And I did want to spend a little bit of time, uh, talking about that.
Um, one that we have that has actually come out is that Alibaba has launched
Qwen3 um, which is, uh, a kind of whole class of models that they've kind of
put out, which is the latest generation of their kind of Qwen3 generation.
Of models.
Um, and I guess Chris, I think I wanted to kind of start with a little bit of
like a kind of like technical explainer.
So in the blog post they talk a little bit about how these models are, what
they call hybrid models, which combine quote thinking and non-thinking modes.
Um, and I think again, in true, uh, AI form, we picked all sorts of
terminology that's like very confusing.
It's like, what is a hallucination?
Anyways.
Um, and so I guess I wanted to kind of just initially start with like
what is a thinking and non-thinking mode when it comes to AI and
like why is it kind of important for what they're doing here?
Yeah, so when we hear thinking, I will, I think of it as the kind of reasoning
models, like the o1's, o3's, o4's, right?
So, uh, in those particular cases, if you think of what a model is, it is a
kind of next token prediction model.
So, you know, it is gonna be token, token, token, token.
So whenever it's answering a question and that, and that works great.
Um, but you can imagine some of these.
Problems are a lot harder to solve.
And therefore, if you equate the thinking time to the number of tokens that you
generate, then the more tokens that you generate, the more likely you're
gonna get some sort of good answer.
Right.
And, and so when you were saying thinking mode, in that sense, it's like, like a
human being rather than blurting out the first thing that comes into your mind.
Spend a little bit of time deliberating the, you know, whatever the answer
is gonna be before you open your mouth and, and, uh, announce your
feelings to the world, right?
Um, and keep those thoughts inside.
Keep them inside.
So, so regular human beings don't know about it.
So that, that is kind of what the idea of thinking is there.
Now there is some class of questions where.
No matter how long you think about it, thinking is not gonna help.
Right?
So things like, you know, what is the capital of England, right?
So if you don't know the answer, sitting and thinking about it
really isn't gonna help you, right?
So, but doing something like a math problem or a logical or reasoning problem,
if there are six cats and one falls out the window, how many cats do you have
left and how many lives is it's got?
Then it needs to think about that a little bit and then.
You know, it'll come to the answer and therefore you'll generate those tokens.
So the idea of being able to sort of have this hybrid mode.
In reality, you, for some cases, you want thinking switched off right
quick questions, you know, general q and a type knowledge answers.
But if you're doing logic and reasoning, you want the ability to
switch that on and have the model take a little bit of time to think about
that and come back with the answer.
So, um, I still think this is a, this is a problem today that is
gonna go away in the future, right?
Just like human beings, you know, we have learned when to blurt out an answer
and when not to blurt out an answer.
You don't say to human being, I mean,
speak for yourself, Chris.
I,
well, actually, maybe not right, but maybe I haven't learned, but.
I, I think, I think in, in, in time.
Then I think that's gonna relax there.
And we're not gonna have to switch that on or off, but I, I do like this idea
of the future of like a thinking budget.
You know, you've got five minutes to think about it, three minutes to think about it.
So I think this practice is gonna evolve, but I think is,
is very much a positive of
the, uh, the hybrid mode.
And Chris, I think one thing that's been raised before, but it might be
kind of fun to kind of tackle it more directly with kind of this segment and
these releases that are coming out.
Um, you know, some people have commented like, I think, um, uh, Kate might have
mentioned it on a previous episode, but
we're really kind of, sort of starting to see like the, the,
the return of mixture of experts.
Like it feels like that is like now very much back on the table.
It's like what everybody's doing.
So like what was kind of uncool again is like really back and forth.
Um, and so wanna talk I guess a little bit about like why that's the case now
that we're kind of seeing it in Qwen3 and is rumored for the DeepSeek-R2
launch, which is kind of gonna also potentially be coming out maybe even by
the time this episode releases is rumors that happened potentially this week?
Yeah, I mean, uh.
Whoever came up with the name of this podcast, uh, was, uh, quite prescient.
I mean, uh, "Mixture of Experts." um, uh, the term has been
around for, for a long time.
It meant something different, uh, when I was in grad school, um, so with
these gating mechanisms and stuff.
But, um, uh, I mean, the point of it is, uh, really, uh...
I mean, just like Chris was saying, like humans, uh, don't blurt stuff out.
Humans also don't use their entire brain when they're thinking, right?
I mean, the, I don't know what the stat is.
Like we only use 10% of our brain at, at a time, right?
Um, so, uh, same idea.
I mean, you don't need to, you use everything.
You don't need to activate everything.
Um, uh, because, uh, really there's, uh, a portion that's, um,
really the, the important part when you're, uh, thinking about something,
computing something, what, whatever have you in inferring something.
And so, uh, I think it's just taking advantage of that.
Uh, I mean, you can use less power, less computation, less, uh, of everything.
Um, if you're only activating the, the relevant parts.
And then if you can know, um, uh, which parts to, to activate, uh,
then, uh, then, then that's gonna end up being a, a good thing.
And then you can have, uh, kind of, uh, different, uh, sort of
specializations, different sort of, uh,
things that are, are, are better at, uh, at particular, uh, aspects.
So, uh, about of cats falling outta trees, uh, mixture of expert or an expert
and then, uh, uh, I mean what, whatever, I mean, all sorts
of different experts on there.
So I think, uh, that's where, where things are headed.
Sure.
But I think the final bit that I was hoping to get your take on is,
you know what I love about the sort of DeepSeek story is how much it
is kind of like messing with all of our intuitions about how competition
and AI is like supposed to go down.
So the first one of course was like, oh, okay, in the US it was Meta doing open
source versus these closed source guys.
And so the introduction, introduction of DeepSeek is like, oh, well
now even Meta has competition.
Yeah.
Um, and I think the other really interesting element
that's been rumored around R2.
Is that they are doing the training not on the video, which I think is
really intriguing and also kind of completely scrambles the idea that
like, oh, everybody's gonna just build on, you know, Jensen's chips and that's
gonna just be the way the AI works.
Yeah.
I think what I read in the blog post was that the rumors that R2 was trained on
a server cluster of Huawei's Ascends 910B
yes, chip, which would mark a really big transition in how some of this hap happens
kind of at the, at the cutting edge.
Do you wanna talk a little bit about that?
I thought it was very interesting.
Yeah, so I think over time China is brilliant.
Uh, like the people, people are just absolutely stunning
in these research labs, right?
So they're very high concentration of talent.
So they're trying to figure out ways around, uh, supply chain, like
their dependence on the US supply chain for intelligence, right?
So their own.
They have a lot of intelligence to go build their own chips.
There's a competition around not just the chip, but the whole, uh, the whole
set of things that go before and after the ecosystem around the chips as well.
And China, I think has a good shot at this.
They should be able to go and look at the Huawei, uh, a series of chips.
And this is a really good use case on printing them.
If you look at what Google did with their tensor processing units, the tpu right.
They have a unfair advantage that they have both all the
AI and the chip manufacturing.
So when they release their models like Gemini, which are running at
like crazy, uh, um, volume, uh, every day, they need to make sure
that they're super hyper optimized.
Like the Gemini Flash model, for example, running on TPUs.
Is a really good price point at scale with billions of these every day, right?
So you'll start to see a lot of these, uh, companies start to leverage the
architecture underlying optimize that, and China understands that they will not
always be able to get access to the, the, the technology from the rest of the world.
So they will start to create their own supply chain top to bottom.
There's a lot of investment coming from each of the countries
in their own sovereign AI and trying to make sure that they can.
Uh, they can be, uh, masters of their own destiny in the, in the AI space, right?
I'm very excited about this whole David versus Goliath kind of a war, right?
If you're looking at the size of the model that Qwen3 came out with, you
have, um, a good mixture of experts model where the number of active parameters are
very, very low and they're out competing some of the best in class models from
open AI and Google and Llamas, right?
So you have this, this, this crazy
compute, um, intelligence per dollar, that's just completely plummeting and that
unlocks an insane amount of, uh, other use cases that we would deploy this at, right?
If you look at the way we, within IBM are looking at, say, our Z series and stuff
too, we want to bring AI closer and closer to where the transactions are happening.
Billions of these in with almost negligible latency.
So I think the whole size of the model being smaller and
outcompeting is a great thing.
Taking this open source and these are Apache 2.0 licenses.
Creating derivatives out of it and owning that intellectual property,
carrying it with you, deploying it where you need to be at the edge
on the servers, on the clouds.
I think that's the future direction that they were taking.
We should be very, very, uh, proud of how far the AI community has come,
on the open source models and the progress that we are making this space.
But I think going back to what, what Kush was mentioning around all the
Guardrails that are needed, when we see models, uh, like Qwen3 come out, we do
not see a lot more transparency on the data that went in or any Guardrails and
stuff that have put, or any, like the equivalent of the Llama Guards or the
granite Guardians being released from the Chinese labs at, at this point quite yet.
Right.
Qwen3 models.
They are text only at this point.
They're not quite multimodals.
That kind of reduces the space of what use cases we can deploy them at.
So a few, few things that they, they do need to catch upon, but there's
just so much happening in this space.
This competition is really, really positive for all of us.
I think there's a flip to that as well, which is I, I think.
Although they're not being particularly open on data, I think labs like
DeepSeek, et cetera, maybe not so much kind of Qwen there, but, uh, Alibaba.
But they're being very open with their code bases, right?
So they open source, their distributed file system, et cetera.
So I think one of the things that I really appreciate in this space with
the competition is that the innovation is moving out into the open source
community, and I think, because these labs are being constrained, it is
forcing them to think in a different way.
So I, I actually really hope that some crazy kid in a garage somewhere is just
gonna turn up one day and go, I've trained a 50 billion trillion parameter model
and, and I've done it with a cheeseburger and, you know, and, and I just took
this chip at my microwave and it was fine.
Do you know what I mean?
And we'll be like, whoa.
And then what do we do with all of those chips at that point?
And I, I, but I actually, I think there's a really serious problem there is like
when the next innovation comes, right?
And it will come at some point where we realize we don't need all of these GPUs.
Well, what are we going to do with all of these GPUs and these massive data centers?
And, and again, if.
If NVIDIA wants to donate some to me, I will happily take them and
I will
find something to do with them.
You hear that Jensen?
If you're listening,
uh,
We always talk about China and US, right?
Um, I'm not sure if you guys know this.
There was a new model that came out, uh, in the last few
days for voice, speech to text.
There's a company, a small lab that are two Korean undergrads.
Who put this model together called D-I-A, uh, Dia.
Dia does speech to text with a very high accuracy.
It understands accents, background eyes, like uh, it
does a really, really good job.
And it's outcompeting the bigger labs, like 11 Labs or um, even OpenAI,
voices and stuff like that.
Super small.
Apache 2.0 open sourced it.
So you're seeing innovation from all over the world.
This is not just a US versus.
Uh, China, and at times you hear Mistrial from France, right?
But this is a global moment.
Right now.
Everybody is investing heavily in India.
You have a lot of labs that are now getting a lot of, uh,
investment to go build these models and own your own destiny.
So just the fact that the whole community, the global community is all in on
open source and powering through it.
That's, that's the way we should be.
And, and
Shobhit, just to add to that Sir Demis Hassabis, I mean, please
note the word "Sir." I don't hear no American accent on him, buddy.
That's actually a great pivot into our next segment.
Uh, one thing I did really want to cover was this, uh, pretty interesting
letter that came outta J.P. Morgan, Patrick Opet, who is their Chief
Information Security Officer.
Pens sort of an open letter to the industry that was kind of a call to
action to work on SaaS security, which is a, a big problem and a known problem,
but I thought was pretty interesting is that he focused specifically on, um,
AI and its contribution to this issue.
So I'll just quote what he said.
He said critically.
The explosive growth of new value bearing services and data management, automation,
artificial intelligence, and AI agents amplifies and rapidly distributes
these risks, bringing them directly to the forefront of every organization.
And so kind of, there's a SaaS security issue and then AI's gonna
basically like pour gasoline on it.
And I guess, Chris, we both throw it to you as like, this seems pretty dire.
Are we, are we in trouble?
Uh, I don't know if there's anything new that, uh, is being said in this letter.
Really.
I mean, yes, agents of course.
Um, but, uh, uh, we've had agents really, I mean, for, for decades.
And in some ways, if you think about it, I mean the, uh, the autonomy,
the, um, uh, the, the action taking is, is not particularly new.
I mean, with AI agents these days, it's the interaction through natural language.
Um.
More data and, and this sort of stuff.
But, um, I mean, there's been things, I mean, Shobhit mentioned
the, the Z processors and stuff.
I mean, transactions happen, things happen very quickly.
Um, there's been, um, I mean, software as a service for more than a decade.
Um, and, uh, like really, I mean, the point of the letter is yes, I
mean, focus on security, of course.
I mean, uh, like who's not gonna say yes.
Um, maybe there's a culture sort of issue that's, uh, being pointed
out that, uh, uh, maybe we need to think more, um, about these, uh,
important consequential industries.
Regulated industries and stuff at the forefront.
Maybe, uh, like you said, Tim, I mean, the mixture of experts.
We don't cover the, that in industry so much.
Maybe we should, um, and other sort of industries like this.
But, uh, yeah, I mean, like for people who do work in those industries, um, this
is not a, a, like anything new really.
So this
is a good reminder for everybody, uh, that like you have to think about the
governance as we move from experiments to going into production at scale.
And I'll give you my take on a more of a, from an enterprise perspective, right?
When I'm working with my clients and we are putting things into production, I.
Across all of these different SaaS vendors, everybody else is
in this rush to force agents into their SaaS, uh, platforms, right?
The, like even industry standards like MCP, it took them a while.
It like worked different versions to get to supporting
authentication properly, right?
This is so much in software engineering that has been done
to secure things the right way.
And we are almost like throwing that away and starting from scratch when you're
getting to these agents and stuff, right?
We have collectively decided as a humanity that English is the
way to talk to these agents and it just does not scale quite yet.
Right?
If I'm trying to to call another agent and give it a task, I need more structure.
'cause I need to do error catching.
I need to be able to pass authentication for this particular task I'm giving
you read access to this particular data set and whatnot, right?
So we need to evolve beyond the cute demos that we have all done
on stage last year and this year.
As you get more and more, uh, uh, more serious about rolling these
out at scale, the governance aspects of it, you have 10 different, uh,
SaaS vendors for the marketing team that you're working with, right?
The CMO is not spending that much time understanding what's happening
with all of that data is everything.
If, if I have to go put a small policy and a policy could be as simple as:
I don't want generative AI to create any content that
refers to these 10 competitors.
If I give it that small a tos, such a basic thing to do, it's insane how
much effort it'll take for all these enterprises to go make that happen
in every 10, 20, 30 different SaaS vendors that they're working with.
So we are struggling as, uh, when we are delivering these with enterprises.
That's why all the work that Kush and team are doing around governance,
around security, guardrails, things of that nature are so important.
And this has to work across regardless of which AI model that you're
bringing into the organization.
So I think J.P. Morgan Chase is doing a really good job at giving
a reality check to leadership.
And what J.P. Morgan does is often imitated and copied and people
in, uh, get inspired by the work that they've done in the space.
We need more people to talk about governance and the reason why we need
smaller models that you can monitor.
You have the right guardrails, even speech models as you start
to move from just text-based LLMS to now speech to speech native.
I'm trying to roll something out at one of our regulated industry
clients and it's very tricky.
The speech models tracing, looking at auditability, the agent ops
that's needed for speech models and stuff is insanely difficult.
There's a lot that this community is gonna do.
I would say if 2025 we say is the year of the agents, I would argue that 2025
is the, is the year of governance.
We got to get this right if we have a shot at going at production, at scale.
Or, or we could build more agents to solve the problem.
Not, you know, I, I get your point.
We should go governance and I, and I, and I, I believe that's a serious thing.
And we should build walls and put everything behind walls and then
nobody can access in anything.
Or, or we could go, Hmm, let's build more agents.
And then the, the good agents can fight the bad agents.
Yeah.
And then we're gonna be fine because we're gonna have a little good
agent versus bad agent more so I think any problem that is super hard
today, we don't need to solve that
with things like governance, we can solve that with more AI.
That is, that is my solution to this.
So Cisco, we recently had the big security conference this week.
Cisco released a foundation model for security.
IBM has done a lot in this space around security related models, right?
So if you're looking at cybersecurity risks, you're looking at
hallucinations, things of that nature.
So I think there'll be enough AI improvements that we are doing.
You need ai, good AI to fight bad AI a hundred percent with you on that.
Um, we do need to talk about the discipline that enterprises need to have.
To ensure that those good AI agents are deployed as default baked and security
by design, not as an afterthought
bolted on.
Right?
That's the point that I think J.P. Morgan Chase is arguing
that get excited about this.
Huge, huge benefit to us, but you have to make sure that there's a
secure by design for the very big.
But you can't cripple innovation at the same time.
No, I get it.
Right.
There's certain areas where you have to say.
You know what, um, I, this is a very serious thing and I need
it to not hallucinate, blah, blah, blah, blah, blah, blah.
But then at the same time, you need to make breakthroughs and
you need to discover new things.
And we have a hype cycle to maintain, right?
So at the same time, we can't kind of hold back on that.
So I'm, I, I get it.
And I think for certain regulated industries, I understand that
and that makes sense, but.
Um, but at the same time, sometimes hallucinations are a good thing, right?
Because it gives you a bit of creativity.
So I'm, you know, we just, we just need to, we just need to be
appropriate for the right scenario.
Okay.
Chris' point is actually a good one.
Um, so, uh, when you have a mix of, uh, some like, uh, things that are controlling
others, it doesn't always have to be just in a closed sort of system, like with a
single governor, um, sort of thing, right?
I mean.
Our immune system, like it controls diseases and stuff, right?
I mean, there's bad things happening, there's good guys fighting against it.
Um, it happens in nature all the time in different ecosystems.
So, um, if you take the, the big system level view of things, uh,
control is not always just like one little knob and, and and stuff.
So I think it's actually a, a mix of things and, um.
Yeah, I just wanted to end with a shout out to the robust intelligence folks.
So that's the, the team that put together the, the Cisco model
that, uh, that Shobhit mentioned.
So, uh, really good, really good work for, from, from them.
Great.
Well, that's resolved.
The, resolved, the Shobhit, Chris, uh, debate conclusively.
I thought my, I thought my cousin would be on my side.
I'm on both sides.
You're
on both sides.
More security, the
better.
Yeah, exactly.
Chris likes both of you equally.
It's fine.
So I think to close our episode, as I mentioned at the top of the
show, we're this is the first anniversary episode of MoE.
Very fast year.
Uh, we were able to bring together the original cast from episode one.
And so a little bit like the kind of kickoff question we did, I thought
it'd be fun to end with just like a final segment just talking about like,
what we did on that first episode.
'cause it's very fun to take a look back and be like, oh yeah, like,
what, whatever happened to that?
Or like, oh, that turned out to be a really big thing.
So it's kind of just like a fun exercise.
Um, producer Hans here will be playing some clips.
You'll actually be able to hear yourself from a year ago, which
may either be fun or cringeworthy.
We're about to find out.
Um, but I think the first topic that we covered on episode
one was the Rabbit R1 device.
Yeah.
Which, if you recall, was a small, cute little hardware device with AI embedded,
and it was a conversation about AI hardware and where it was gonna go.
Hans, do you wanna roll that tape so we can have the respective
takes of everybody on the show about what they said about that?
But it, it's like trying to sell a pager to somebody today.
It's like, here's this thing that's got the things you need.
You can get messages and, you know, but nobody has a pager, right?
Because it was replaced by the phone.
And, and so I do think there will be AI on the hardware devices.
I, I just don't get that one.
Just being an optimistic of where the tech is going.
I'm more on the wash of, uh, I see the promise of what this.
And Apple takes a while to come into this industry, right?
Same thing goes with the Vision Pro glasses, right?
I, again, I was a big fan of them when I bought them early on and
three days in I did return there.
To me, what is this is leading to is actually like a fourth paradigm of
how we interact with computing, right?
I mean, there was punch cards, there was command line, then there was GUIs, and
this is now, I mean like we're in this fourth sort of era, the language, natural
language interactions and so forth.
I think, I mean, yeah, I mean maybe there's no killer app yet, but the
killer app maybe is the fact that we have this new way of interacting and
that's what these devices are gonna, uh, start us, uh, on the road down.
Nice.
Uh, that's awesome.
Well show, but I'll start with you 'cause I think you actually bought
a Rabbit R1, um, where is it?
Where is it now?
It's in the garage.
In a box.
Okay.
I gone and sell it on eBay, man.
Oh, really?
Can't even, there's no secondary market for the R1.
Oh man.
I'm hoping this will be, uh, one of those things that goes
for a million bucks later.
But yes, it's, uh, it's, it's in a garage in a box.
I, I couldn't even find it for, to bring this, for this episode.
But, uh, I think overall, I still stand by what I said.
I think the, the market needs to evolve and we are not there yet.
We've not seen a single device that is beyond, like even Ray-Bans.
I obviously have the Ray-Ban glasses as well, but it's, it's okay, but
not at the point where you can really use it as a, as real device.
I think the last time we saw an.
Accessory that could augment, uh, your, uh, your iPhones and stuff was the watch.
Right?
Watches.
They looked at a niche, they went off of that and they're
augmented extending your phone.
They don't work without the phone.
Right.
It just works really, really well as a, as a partner.
So I think we'll get to that point with, with devices, but.
I've not seen another thing to throw
my money at yet.
All right.
Real quick, Chris, do you wanna take a victory lap on this one?
Because I think you won this point.
Yeah, no, and I'm, I'm feeling good about that one.
That, that, that thing is a pager.
I said it at the time and I'll say it again, so I'm feeling good.
All right.
Sounds good.
Second thing, uh, that we covered on episode one was the rise of a mysterious,
uh, chatbot on Chatbot Arena called GPT-2 Chatbot, which if you recall, there's
wild speculation about what it was, and.
Um, I guess Hans, do you wanna play the clip about like your,
your, your all's take at the time?
Is it GPT-5?
I don't know.
I think they've hyped GPT-5 so much that if that is
at this point, it has to be AGI or it's like not even gonna impress us.
Exactly.
So maybe it's GPT-4.5 but I, I don't think that I, I, I read a theory online.
I can't say who's said it, but I actually like it.
I, somebody said that, uh, take the GPT to, uh, LLM, which they've open source.
You can download that in Hugging Face.
And they reckon that they may have trained GPT two on the, uh, latest,
uh, data that trains the GPT-4.
And I think that's an interesting theory, right?
You know, GPT-2 with GPT-4 data.
So maybe it's something like that.
Um, I don't know.
Um, but I don't think it's GPT-5.
It, it probably is GPT-4.5, and as you say, you've, you've gotta put
it in some sort of arena to, to see how well it's actually performing.
Chris, that was a pretty good guess.
I mean, 'cause we're now living in a GPT-4.5 World, right?
Yeah.
I think, I think I got, I can't even remember what that model was.
Was that like the when, when was this?
Was that the 4.o, or was it slightly after?
Yeah, so that was the
next model that GPT released.
And I think one of the, somebody had spilled the beans saying that, Hey,
yes, that was, that we, it scored really high during the GPT-4o released.
So our guess is that that was, uh, the testing they were doing on LLM arena.
Yeah. I can't even remember.
That's how far back it was, but I think, I think my guess was pretty good, right?
It wasn't quite four or five, but it was basically the next version of the model.
Yeah, I feel good.
Yeah, I still think the GPT-2 theory was a great one though, so, you know, so,
so somebody's gotta do that actually.
That's a good idea.
All right.
And so for the final one that I want to play, we, we talked about
agents, which has since become basically like an ongoing MoE
in joke, I suppose.
Um, and, uh, I think with Shobhit and Chris, on the show, both
of you are probably our most prolific users of the word agents.
I feel it.
It, uh, if we just did a word count of all the things you've said on
MoE, probably you two would be at the top of that leaderboard.
So Kush, I'm gonna ask you to maybe make a guess, which is,
which one of these guys used
agent first on MoE, um, like who is number one on, on breaking that seal, I guess.
Yeah, this, this time I'll go with the family member.
Woo.
The was plan.
All right, well roll the tape, Hans.
The talk by Andrew on, uh, how agentic flows
are going to be the way we get to the AGI.
Yes!
Nailed it, congratulations.
So show didn't blame for starting that
I heard agentic.
Yeah.
I didn't the word, I didn't hear the word agents at all.
I, I call, uh, this is, uh, a fix.
Yeah, yeah.
Disqualified.
So we'll do some investigation on who actually used the word
agent first, but I guess Shobhit
you can, uh, be, uh, you can rest easy knowing that you were really
a, a pathblazer there for us.
But I still, but you should take that
a second to acknowledge how far we have come in the last year.
At that point, uh, Andrew had just introduced how a 3.5, uh, can, can
GPT-3.5 with tools can actually outcompete GPT-4 and stuff, right?
So just imagine how far we have come in terms of the cost, the kind of
tools, the ecosystem around agents.
I'm just very proud of where the community is today with the, with our, what's
happening in the multi-agent space.
Yeah, for sure.
And more to come soon.
I mean, I think we're gonna have to do this next year as well
where we look back on what we were talking about for, uh, in 2025.
Ooh, we should have done, uh, what should be the next word?
That'll catch it.
That'll go viral, right?
You may have used it on this episode.
Open letter.
You know,
Congratulations to the production crew for the one year anniversary.
I wanna give a big shout out to the producer Hans, Alex, Michael and Selma,
you guys have poured your soul into this.
Thank you so much for bringing the Mixture of Experts to our audiences.
Thank you so much!
Happy birthday.
Happy birthday.
Well, that's all the time that we have for today.
Uh, Kush, Shobhit, Chris, an amazing panel.
Glad to have you on again!
Um, and thanks to joining us all you listeners, if you enjoyed what you heard
you can get us on Apple Podcasts, Spotify, and podcast platforms
everywhere, and we will see you all next week on Mixture of Experts.