Learning Library

← Back to Library

LlamaCon Unveils Developer‑Friendly Llama API

39m • Unknown Channel • ai-ml • news • intermediate • Watch on YouTube ↗

Key Points

The panel reflects on AI hype that didn’t pan out, noting that technologies like Kolmogorov‑Arnold Networks and certain “pin” innovations have proven less impactful than expected.
Experts highlight a sharp decline in “intelligence per dollar,” indicating that the cost efficiency of AI has worsened despite broader hype.
A call‑to‑action from J.P. Morgan and a surge of activity in China’s AI market signal new strategic pressures and opportunities for AI governance and investment.
Meta’s first LlamaCon introduced the Llama API, a unified developer platform that combines closed‑source power with open‑source flexibility, providing centralized fine‑tuning, evaluation, and hosting to simplify enterprise use of Llama models.

Sections

Full Transcript

# LlamaCon Unveils Developer‑Friendly Llama API **Source:** [https://www.youtube.com/watch?v=yeutlpU13YM](https://www.youtube.com/watch?v=yeutlpU13YM) **Duration:** 00:39:38 ## Summary - The panel reflects on AI hype that didn’t pan out, noting that technologies like Kolmogorov‑Arnold Networks and certain “pin” innovations have proven less impactful than expected. - Experts highlight a sharp decline in “intelligence per dollar,” indicating that the cost efficiency of AI has worsened despite broader hype. - A call‑to‑action from J.P. Morgan and a surge of activity in China’s AI market signal new strategic pressures and opportunities for AI governance and investment. - Meta’s first LlamaCon introduced the Llama API, a unified developer platform that combines closed‑source power with open‑source flexibility, providing centralized fine‑tuning, evaluation, and hosting to simplify enterprise use of Llama models. ## Sections - [00:00:00](https://www.youtube.com/watch?v=yeutlpU13YM&t=0s) **MoE One-Year Anniversary Recap** - The podcast celebrates its first year by reuniting original guests to dissect overhyped AI trends, discuss new developments such as LlamaCon and Chinese AI market activity, and reflect on the evolving AI landscape. - [00:03:03](https://www.youtube.com/watch?v=yeutlpU13YM&t=183s) **Meta's Open-Source Llama Hub Debate** - The speakers discuss whether Meta's release of open‑source Llama models and a centralized hub indicates strategic strength by cultivating an ecosystem or a defensive maneuver, highlighting the value of a standardized stack and the growing importance of fine‑tuning personalized models. - [00:06:06](https://www.youtube.com/watch?v=yeutlpU13YM&t=366s) **Tiny Prompt Card Advances Guardrails** - Speaker notes a new 22‑million‑parameter prompt‑card model, references the recent GuardBench benchmark where Granite Guardian tops the leaderboard, and emphasizes how such lightweight models could boost AI safety through layered guardrails. - [00:09:12](https://www.youtube.com/watch?v=yeutlpU13YM&t=552s) **Upcoming Llama Model Landscape** - The speaker outlines two forthcoming Llama models—a compact 8‑billion‑parameter version and a massive, yet‑to‑be‑practical behemoth—while noting challenges in model distillation, multi‑agent orchestration, and expressing optimism about open‑source collaborations with partners like IBM and Box. - [00:12:16](https://www.youtube.com/watch?v=yeutlpU13YM&t=736s) **Deliberate vs Instant Answer Mode** - The speaker argues that AI should toggle a “thinking” phase—suppressing internal deliberation for simple factual look‑ups while activating it for logical or math problems—to provide appropriate, timely responses. - [00:15:24](https://www.youtube.com/watch?v=yeutlpU13YM&t=924s) **Selective Brain Activation, AI Specialization** - The speaker likens the myth of using only part of the brain to AI systems that activate only relevant components, arguing that modular expert mixtures can reduce computational load and outperform monolithic models, thereby reshaping common assumptions about AI competition. - [00:18:40](https://www.youtube.com/watch?v=yeutlpU13YM&t=1120s) **Sovereign AI and Edge Deployment** - The speaker highlights the drive for nations to build independent AI supply chains, the rise of efficient open‑source mixture‑of‑experts models like Qwen‑3 that rival major providers, and the push to run these models at the edge on IBM Z hardware under Apache 2.0 licenses. - [00:21:42](https://www.youtube.com/watch?v=yeutlpU13YM&t=1302s) **Future GPU Surplus & Global AI** - The speaker worries about excess GPUs from waning demand, proposes repurposing them, and highlights worldwide AI breakthroughs such as Korea’s open‑source speech‑to‑text model, emphasizing a shift beyond US‑China competition. - [00:24:48](https://www.youtube.com/watch?v=yeutlpU13YM&t=1488s) **Governance and Security in SaaS Deployments** - The speaker stresses the importance of robust security governance when scaling SaaS solutions, especially for regulated industries, noting the rush to embed agents, evolving standards, and the shift from experimental to production environments. - [00:27:56](https://www.youtube.com/watch?v=yeutlpU13YM&t=1676s) **Governance vs Agent Proliferation** - The speaker contends that scaling AI—particularly auditable, smaller speech models—requires rigorous governance in 2025, warning that relying solely on creating more agents to counteract risks is an insufficient solution. - [00:31:01](https://www.youtube.com/watch?v=yeutlpU13YM&t=1861s) **Anniversary Episode: Reflections and Shout‑outs** - The hosts celebrate MoE’s first‑year milestone, recap the debut show, commend the Robust Intelligence team’s Cisco model, resolve a debate, and preview nostalgic clips from the past season. - [00:34:04](https://www.youtube.com/watch?v=yeutlpU13YM&t=2044s) **Futuristic Wearable Market Speculation** - The hosts debate the untapped value of an R1 device tucked away in a garage, compare it to Ray‑Ban glasses and smartwatches as phone‑augmenting accessories, argue the market isn’t ready for true wearables, and briefly celebrate a point about a pager before noting a mysterious chatbot. - [00:37:06](https://www.youtube.com/watch?v=yeutlpU13YM&t=2226s) **First “Agents” Mention Competition** - Participants joke about a leaderboard to determine who first used the term “agents” on MoE, linking the word’s debut to early discussions of agentic flows and tool‑augmented GPT models. ## Full Transcript

0:00I wanna go back one year, it's May, 2024 again. 0:03What's the biggest thing in AI that turns out to be not that big of a deal? 0:07Kush Varshney is an IBM fellow, uh, on AI governance. 0:10Kush, welcome back to the show. 0:11Uh, what do you think? 0:12Uh, Kolmogorov-Arnold Networks. 0:14Got it. 0:14That's a good one. 0:15Shobhit Varshney, Head of Data and AI for the Americas. 0:18Shobhit. 0:18The cost of AI, I think the intelligence per dollar has plummeted significantly. 0:23Absolutely. 0:24And last but not least is Chris Hay, Distinguished Engineer and 0:26CTO of Customer Transformation chris, what do you think those 0:31stupid pin things we got all excited about last year. 0:34All that and more on today's Mixture of Experts. 0:43I'm Tim Hwang and welcome to Mixture of Experts. 0:44Each week, MoE brings together the smartest and I think the most good 0:48looking crew in all of podcasting to discuss and debate the biggest 0:51news in artificial intelligence. 0:53And this is a big episode. 0:54Today we're officially celebrating our one year anniversary of MoE. 0:59We brought together the original crew from MoE episode one to join us. All-star Crew. 1:03We're gonna do a look back, uh, cover a call to action from J.P. Morgan, a new 1:09wave of action in the Chinese AI market. 1:12But first, I really wanted to cover all the latest from LlamaCon. 1:16So I believe this was the first event, uh, officially the first LlamaCon 1:20that Meta has run, focusing on its work in the open source space and 1:23around the Llama class of models. 1:25Um, I think a lot of announcements to cover here, but I think Shobhit the 1:28first one that I was really intrigued to get your take on was they announced 1:31this thing called Llama API, and it's a developer platform that quote will 1:35bring together the best of closed source with open source flexibility. 1:39Um, and so for our listeners who might be less familiar with 1:41this, like what have they done and why is it kind of a big deal? 1:44I always, in my opinion, I think it's kind of a big deal. 1:46Yeah. 1:46So today, uh, in the current state, if uh, an enterprise needs to go play around 1:50with Llama models, you go to one of your hyperscaler partners and say you're 1:54gonna use their version of the studio. 1:56Their way of fine tuning it and whatever the hyperscalers are, are producing. 2:01And then once you're done with that model, it's difficult to 2:03move it around and whatnot. 2:04Right? 2:05So in this particular case, Meta is coming out and saying we want to be 2:08as developer friendly as possible. 2:11We'll give you a central place with all the playgrounds, the 2:14fine tuning capabilities, also evaluations, and so on, so forth. 2:17So as you're fine tuning the model, you can test it out. 2:20All of that will be done centrally. 2:22They will host the API for Llama as well. 2:25You can obviously still get it everywhere else that you get your, uh, your LLMS 2:29from, but now they're developing a whole set of stack, so they're moving 2:33beyond just providing the model to be providing the whole ecosystem. 2:37They have done enough work in the space with Llama Stack and a few other things 2:41in the past, but this was their coming out party saying that we are gonna 2:44be as developer friendly as possible. 2:47Come work with us, we'll help you fine tune it. 2:48Once you're done with that model, you can take it anywhere. 2:51Obviously there are a lot of things around privacy where they will not train 2:54the model, uh, on the data that you're providing them and so on and so forth. 2:57But the inference speed is, is amazing with their partnerships 3:01with service and GR and others. 3:03So overall, they wanna be the hub where people come and experiment 3:06with Llama models versus Llama models being one of the 200 models available 3:12on Microsoft or AWS or Google. 3:14For sure. 3:15And Chris, maybe to bring you into this conversation, I was having a debate 3:17with a friend about this announcement and we're kind of talking about like 3:19whether or not this is almost like a position of strength for Meta or almost 3:23like a position of weakness for Meta. 3:25There's one point of view which is, hey, we release these open 3:27source models and everybody will build all the tooling around it. 3:30Essentially that's like kind of what we do is we do the model and then like 3:34everybody else builds the ecosystem. 3:36So that's kind of like the, the, the bear case. 3:38And uh, my friend was like, well the bull case is actually that. 3:40Like they recognize that like they're actually investing more in this space now. 3:44And it's like really they recognize that there's such a big opportunity that 3:47they have to actively build this stack. 3:49I'm curious if you kind of have any feelings about that or how 3:51you kind of size up these moves. 3:53I think it's a really interesting move. 3:55I mean, as you kind of say, I think it's a great move. 3:58I think having a sort of standardized stack where you can bring your 4:02models, you can fine tune them, and, and I think fine tuning is gonna 4:06become a bigger thing in the future. 4:08So, you know, because you're gonna want your own personalized model, you're gonna 4:11want something with domain knowledge and therefore bringing that into a 4:14consistent place, I think is a good thing. 4:16And then if you think about where Meta wants to go in the future, they 4:20want AI to power all your avatars assistance, et cetera, uh, on their 4:23platforms and, and have agents on there. 4:27Um. 4:28Then I think making it easier to have a playground for developers and individuals 4:33to, to tune models based on Llamas stack, I think is a sensible thing. 4:37I, I do think though, that when I really look at this though, um, all the APIs 4:44are OpenAI, compatible APIs, and nearly every single service provider is moving 4:50towards OpenAI compatible APIs anyway, so I, there is still a part of me that 4:56goes, well, can I do that somewhere else? 4:58And, and, and sure with the fine tuning part specifically, that is hard, right? 5:03Because, you know, getting your models out of some of those 5:06existing stacks and taking 'em elsewhere is a more difficult thing. 5:09So I think, I think that is a differential play 5:11in my mind. 5:12Totally. 5:12Yeah. 5:13It's getting more complicated, kind of seeing them navigate this. 5:16Christian, another part of the announcement that I wanted you 5:18to comment on was that they also announced all of these kind of 5:20security and protection models. 5:22Um, so Llama Guard, four Llama firewall, Llama, Prompt Guard 2. 5:27It kind of feels like a little bit of like, almost like the protection space 5:31around AI starting to get like a lot more complicated than it used to be. 5:34Where the old thing was like, oh, well we just have a model that tells you 5:36if like the outputs are toxic now. 5:38It feels like they've got like at every layer of the stack, there's like a model 5:41you can use for security and safety. 5:43Um, curious about how you kind of like read these trends. 5:45Like where is this going? 5:46Is it just gonna become like a more and more complicated, you 5:49know, ecosystem of safety models? 5:51Um, yeah, just curious about your hot take on that. 5:53Yeah. 5:54Um, so yeah, as you said, I mean, they, uh, have this new, uh, Llama Guard 4, 5:59uh, it's uh, 12 billion parameter model. 6:02Um, it's multimodal, so it has the vision and the text in there. 6:05Um. 6:06Uh, the, uh, the prompt card they made really tiny, I think, uh, 22, 6:10uh, million parameters and stuff. 6:12So, yeah, I mean, they're making progress. 6:14Certainly. 6:14Um, uh, the headlines are good. 6:16Um, uh, uh, we haven't had a chance to, to evaluate and see, 6:20uh, what the performance is yet. 6:22And, um, yeah, actually just, uh, a week and a half ago, um, uh, maybe 6:26two weeks ago, there was a new, uh. 6:28A benchmark that came out, uh, called Guard Bench. 6:30Uh, so this Guard Bench, um, actually goes and tests, um, a lot of, uh, of 6:36different Guardrail models and and stuff. 6:38Um, uh, just a a side note, uh, the Granite Guardian model that I've 6:42talked about in the past is, uh, at the top of that leaderboard, 6:45but, um, uh, we should see, I mean, how is, uh, how is the big deal? 6:49Yeah, exactly. 6:49The how's Llama Guard for, uh, uh, doing there because, uh, if they've really made 6:55good progress, that's, uh, that's awesome. 6:56Um. 6:57And, uh, I mean, the fact that the, uh, the prompt card is so tiny, I 7:01think, uh, that, because that's gonna make a, make a huge difference because 7:05it's like 22 million parameters. 7:07It's like a blink of the eye. 7:08I mean, it's like, uh, uh, you can do it so fast. 7:11So, I mean, I think the overall space is, uh, just. 7:14Becoming where people are realizing the seriousness of safety and security. 7:18So, uh, just having everything there. 7:21I mean, multiple layers of security. 7:22I mean, that's, uh, just good practice. 7:25Uh, so having it, uh, on the inputs, on the outputs, um, the overall firewall. 7:29I mean, all of that is, uh, is good stuff. 7:32And then we'll see, uh, I mean, how it goes, uh, how it progresses and, um, uh. 7:38I mean, no, uh, no concern for me. 7:40I think this is where, where the field needs to go. 7:42Um, Shobhit, before we move on to our next topic, any other, uh, kind of 7:45announcements that you'd highlight? 7:46I know there's a bunch announced. 7:47Those are the kind of two that stood out to me, but I know there's, 7:49like, I saw the whole blog post. 7:50There's like a lot going on. 7:51Yeah. Yeah. 7:52Um, the other couple things were one was around their Meta AI app. 7:56They have consolidated all of their intelligence into one app, and that 8:00could be a ChatGPT competitor or Gemini and so on and forth, but they want one 8:04app that people can go and do, do some cool things and, and, and talk to it. 8:08And they have the potential to make this super hyper-personalized because they 8:13have billions of, of, uh, interactions happening across all of their. 8:17WhatsApp and Instagram and um, and Facebook. 8:20You could potentially have a avatar that is really personalized to 8:23your particular needs and wants and things that you care about, right? 8:27It is a delicate balance between privacy and hyper-personalization. 8:31They'll have to do that balance re uh, delicately. 8:34But they, they have a huge bet on creating the one app where you go to 8:39for all of your, uh, all of your AI. 8:41There are a few other things that may have been, uh, like brushed off in 8:44the, in the details, but they have done, they've had about 1.2 billion 8:49downloads of Llama models and most, and a lot of those, like majority 8:53of those are derivatives of Llama. 8:55On Hugging Face and other places, right? 8:56So clearly the momentum around open source with the developer community 9:00is amazing and Llama has had a huge impact on where we are today 9:04with open models versus others. 9:07But there were a few things where I was, was still on my wishlist that, that we 9:10couldn't quite, that they didn't get to. 9:12Uh, there are two other models that they had announced. 9:14They're not coming quite yet. 9:16One is their small little Llama model. 9:19That'll be about an 8 billion parameter model. 9:218 billion was the, was the most popular size of the Llama model from 9:25the last, uh, previous generation. 9:27We have not announced, seen that yet, but that would be a game changer for 9:30our enterprises, especially if you have good methods of distilling it down. 9:33And then on the other end of the spectrum is the behemoth model. 9:36They still need to figure out what they do with it. 9:37It's not something that's practical at, at this size to be run by enterprises, but 9:42we need to figure out what's the right way of displaying it down or can I use that to 9:46train or other models and so on and forth. 9:49There are other things around, uh, multi-agent orchestration that I was 9:52expecting Llama to, to release as well. 9:55Uh, like things like MCP support and agent to agent, uh, protocols 10:00or anything around agent ops as part of the whole Llama stack. 10:04I'm waiting for them to announce more things in that space as well. 10:07But overall, really positive. 10:08Uh, it's good to see that we are celebrating open source, getting 10:11closer and closer to, uh, to the frontier models as well. 10:14So great LlamaCon for all of us. 10:17We had a good partnership with them. 10:18IBM and Box have done some amazing work with Llama was 10:22was announced on stage as well. 10:23So overall, very positive for all 10:25of us. The community has enjoyed it. 10:26That's great. 10:27Yeah. And a lot more to come. 10:27I'm sure what you talked about is like gonna be coming out like in the 10:30next, very soon I think probably. 10:37This is great. 10:38And I, speaking of open, I think I'll move us onto to our next topic. 10:41Uh, we wanted to do a kind of short segment because there's been a 10:43lot of kind of interesting things bubbling up, particularly in open 10:46source, um, in the Chinese market. 10:48And I did want to spend a little bit of time, uh, talking about that. 10:51Um, one that we have that has actually come out is that Alibaba has launched 10:56Qwen3 um, which is, uh, a kind of whole class of models that they've kind of 11:00put out, which is the latest generation of their kind of Qwen3 generation. 11:04Of models. 11:04Um, and I guess Chris, I think I wanted to kind of start with a little bit of 11:07like a kind of like technical explainer. 11:09So in the blog post they talk a little bit about how these models are, what 11:12they call hybrid models, which combine quote thinking and non-thinking modes. 11:17Um, and I think again, in true, uh, AI form, we picked all sorts of 11:21terminology that's like very confusing. 11:23It's like, what is a hallucination? 11:24Anyways. 11:25Um, and so I guess I wanted to kind of just initially start with like 11:28what is a thinking and non-thinking mode when it comes to AI and 11:31like why is it kind of important for what they're doing here? 11:34Yeah, so when we hear thinking, I will, I think of it as the kind of reasoning 11:38models, like the o1's, o3's, o4's, right? 11:41So, uh, in those particular cases, if you think of what a model is, it is a 11:46kind of next token prediction model. 11:47So, you know, it is gonna be token, token, token, token. 11:50So whenever it's answering a question and that, and that works great. 11:53Um, but you can imagine some of these. 11:55Problems are a lot harder to solve. 11:57And therefore, if you equate the thinking time to the number of tokens that you 12:01generate, then the more tokens that you generate, the more likely you're 12:06gonna get some sort of good answer. 12:08Right. 12:08And, and so when you were saying thinking mode, in that sense, it's like, like a 12:12human being rather than blurting out the first thing that comes into your mind. 12:16Spend a little bit of time deliberating the, you know, whatever the answer 12:21is gonna be before you open your mouth and, and, uh, announce your 12:25feelings to the world, right? 12:26Um, and keep those thoughts inside. 12:28Keep them inside. 12:29So, so regular human beings don't know about it. 12:32So that, that is kind of what the idea of thinking is there. 12:35Now there is some class of questions where. 12:38No matter how long you think about it, thinking is not gonna help. 12:43Right? 12:43So things like, you know, what is the capital of England, right? 12:48So if you don't know the answer, sitting and thinking about it 12:51really isn't gonna help you, right? 12:53So, but doing something like a math problem or a logical or reasoning problem, 12:58if there are six cats and one falls out the window, how many cats do you have 13:02left and how many lives is it's got? 13:05Then it needs to think about that a little bit and then. 13:08You know, it'll come to the answer and therefore you'll generate those tokens. 13:11So the idea of being able to sort of have this hybrid mode. 13:16In reality, you, for some cases, you want thinking switched off right 13:21quick questions, you know, general q and a type knowledge answers. 13:24But if you're doing logic and reasoning, you want the ability to 13:27switch that on and have the model take a little bit of time to think about 13:30that and come back with the answer. 13:31So, um, I still think this is a, this is a problem today that is 13:38gonna go away in the future, right? 13:40Just like human beings, you know, we have learned when to blurt out an answer 13:44and when not to blurt out an answer. 13:46You don't say to human being, I mean, 13:47speak for yourself, Chris. 13:48I, 13:51well, actually, maybe not right, but maybe I haven't learned, but. 13:56I, I think, I think in, in, in time. 13:59Then I think that's gonna relax there. 14:01And we're not gonna have to switch that on or off, but I, I do like this idea 14:04of the future of like a thinking budget. 14:07You know, you've got five minutes to think about it, three minutes to think about it. 14:09So I think this practice is gonna evolve, but I think is, 14:12is very much a positive of 14:14the, uh, the hybrid mode. 14:16And Chris, I think one thing that's been raised before, but it might be 14:19kind of fun to kind of tackle it more directly with kind of this segment and 14:22these releases that are coming out. 14:24Um, you know, some people have commented like, I think, um, uh, Kate might have 14:29mentioned it on a previous episode, but 14:31we're really kind of, sort of starting to see like the, the, 14:34the return of mixture of experts. 14:36Like it feels like that is like now very much back on the table. 14:39It's like what everybody's doing. 14:41So like what was kind of uncool again is like really back and forth. 14:44Um, and so wanna talk I guess a little bit about like why that's the case now 14:48that we're kind of seeing it in Qwen3 and is rumored for the DeepSeek-R2 14:52launch, which is kind of gonna also potentially be coming out maybe even by 14:56the time this episode releases is rumors that happened potentially this week? 14:58Yeah, I mean, uh. 15:00Whoever came up with the name of this podcast, uh, was, uh, quite prescient. 15:04I mean, uh, "Mixture of Experts." um, uh, the term has been 15:08around for, for a long time. 15:09It meant something different, uh, when I was in grad school, um, so with 15:13these gating mechanisms and stuff. 15:14But, um, uh, I mean, the point of it is, uh, really, uh... 15:20I mean, just like Chris was saying, like humans, uh, don't blurt stuff out. 15:24Humans also don't use their entire brain when they're thinking, right? 15:27I mean, the, I don't know what the stat is. 15:29Like we only use 10% of our brain at, at a time, right? 15:32Um, so, uh, same idea. 15:34I mean, you don't need to, you use everything. 15:36You don't need to activate everything. 15:38Um, uh, because, uh, really there's, uh, a portion that's, um, 15:43really the, the important part when you're, uh, thinking about something, 15:46computing something, what, whatever have you in inferring something. 15:49And so, uh, I think it's just taking advantage of that. 15:53Uh, I mean, you can use less power, less computation, less, uh, of everything. 15:57Um, if you're only activating the, the relevant parts. 16:00And then if you can know, um, uh, which parts to, to activate, uh, 16:03then, uh, then, then that's gonna end up being a, a good thing. 16:06And then you can have, uh, kind of, uh, different, uh, sort of 16:10specializations, different sort of, uh, 16:12things that are, are, are better at, uh, at particular, uh, aspects. 16:16So, uh, about of cats falling outta trees, uh, mixture of expert or an expert 16:22and then, uh, uh, I mean what, whatever, I mean, all sorts 16:25of different experts on there. 16:26So I think, uh, that's where, where things are headed. 16:29Sure. 16:29But I think the final bit that I was hoping to get your take on is, 16:32you know what I love about the sort of DeepSeek story is how much it 16:37is kind of like messing with all of our intuitions about how competition 16:40and AI is like supposed to go down. 16:42So the first one of course was like, oh, okay, in the US it was Meta doing open 16:46source versus these closed source guys. 16:48And so the introduction, introduction of DeepSeek is like, oh, well 16:51now even Meta has competition. 16:53Yeah. 16:53Um, and I think the other really interesting element 16:55that's been rumored around R2. 16:58Is that they are doing the training not on the video, which I think is 17:02really intriguing and also kind of completely scrambles the idea that 17:05like, oh, everybody's gonna just build on, you know, Jensen's chips and that's 17:08gonna just be the way the AI works. 17:10Yeah. 17:10I think what I read in the blog post was that the rumors that R2 was trained on 17:13a server cluster of Huawei's Ascends 910B 17:17yes, chip, which would mark a really big transition in how some of this hap happens 17:22kind of at the, at the cutting edge. 17:23Do you wanna talk a little bit about that? 17:24I thought it was very interesting. 17:26Yeah, so I think over time China is brilliant. 17:29Uh, like the people, people are just absolutely stunning 17:32in these research labs, right? 17:33So they're very high concentration of talent. 17:35So they're trying to figure out ways around, uh, supply chain, like 17:38their dependence on the US supply chain for intelligence, right? 17:41So their own. 17:42They have a lot of intelligence to go build their own chips. 17:45There's a competition around not just the chip, but the whole, uh, the whole 17:48set of things that go before and after the ecosystem around the chips as well. 17:52And China, I think has a good shot at this. 17:55They should be able to go and look at the Huawei, uh, a series of chips. 17:59And this is a really good use case on printing them. 18:02If you look at what Google did with their tensor processing units, the tpu right. 18:06They have a unfair advantage that they have both all the 18:09AI and the chip manufacturing. 18:11So when they release their models like Gemini, which are running at 18:14like crazy, uh, um, volume, uh, every day, they need to make sure 18:18that they're super hyper optimized. 18:20Like the Gemini Flash model, for example, running on TPUs. 18:23Is a really good price point at scale with billions of these every day, right? 18:28So you'll start to see a lot of these, uh, companies start to leverage the 18:32architecture underlying optimize that, and China understands that they will not 18:36always be able to get access to the, the, the technology from the rest of the world. 18:40So they will start to create their own supply chain top to bottom. 18:43There's a lot of investment coming from each of the countries 18:46in their own sovereign AI and trying to make sure that they can. 18:49Uh, they can be, uh, masters of their own destiny in the, in the AI space, right? 18:53I'm very excited about this whole David versus Goliath kind of a war, right? 18:59If you're looking at the size of the model that Qwen3 came out with, you 19:03have, um, a good mixture of experts model where the number of active parameters are 19:07very, very low and they're out competing some of the best in class models from 19:12open AI and Google and Llamas, right? 19:15So you have this, this, this crazy 19:18compute, um, intelligence per dollar, that's just completely plummeting and that 19:24unlocks an insane amount of, uh, other use cases that we would deploy this at, right? 19:28If you look at the way we, within IBM are looking at, say, our Z series and stuff 19:32too, we want to bring AI closer and closer to where the transactions are happening. 19:37Billions of these in with almost negligible latency. 19:40So I think the whole size of the model being smaller and 19:43outcompeting is a great thing. 19:45Taking this open source and these are Apache 2.0 licenses. 19:49Creating derivatives out of it and owning that intellectual property, 19:52carrying it with you, deploying it where you need to be at the edge 19:55on the servers, on the clouds. 19:57I think that's the future direction that they were taking. 19:59We should be very, very, uh, proud of how far the AI community has come, 20:04on the open source models and the progress that we are making this space. 20:07But I think going back to what, what Kush was mentioning around all the 20:11Guardrails that are needed, when we see models, uh, like Qwen3 come out, we do 20:15not see a lot more transparency on the data that went in or any Guardrails and 20:19stuff that have put, or any, like the equivalent of the Llama Guards or the 20:24granite Guardians being released from the Chinese labs at, at this point quite yet. 20:27Right. 20:28Qwen3 models. 20:30They are text only at this point. 20:32They're not quite multimodals. 20:34That kind of reduces the space of what use cases we can deploy them at. 20:37So a few, few things that they, they do need to catch upon, but there's 20:40just so much happening in this space. 20:42This competition is really, really positive for all of us. 20:44I think there's a flip to that as well, which is I, I think. 20:49Although they're not being particularly open on data, I think labs like 20:54DeepSeek, et cetera, maybe not so much kind of Qwen there, but, uh, Alibaba. 20:58But they're being very open with their code bases, right? 21:01So they open source, their distributed file system, et cetera. 21:04So I think one of the things that I really appreciate in this space with 21:09the competition is that the innovation is moving out into the open source 21:14community, and I think, because these labs are being constrained, it is 21:19forcing them to think in a different way. 21:21So I, I actually really hope that some crazy kid in a garage somewhere is just 21:26gonna turn up one day and go, I've trained a 50 billion trillion parameter model 21:32and, and I've done it with a cheeseburger and, you know, and, and I just took 21:37this chip at my microwave and it was fine. 21:40Do you know what I mean? 21:40And we'll be like, whoa. 21:42And then what do we do with all of those chips at that point? 21:45And I, I, but I actually, I think there's a really serious problem there is like 21:49when the next innovation comes, right? 21:52And it will come at some point where we realize we don't need all of these GPUs. 21:57Well, what are we going to do with all of these GPUs and these massive data centers? 22:01And, and again, if. 22:03If NVIDIA wants to donate some to me, I will happily take them and 22:07I will 22:07find something to do with them. 22:09You hear that Jensen? 22:09If you're listening, 22:10uh, 22:12We always talk about China and US, right? 22:15Um, I'm not sure if you guys know this. 22:16There was a new model that came out, uh, in the last few 22:18days for voice, speech to text. 22:22There's a company, a small lab that are two Korean undergrads. 22:26Who put this model together called D-I-A, uh, Dia. 22:30Dia does speech to text with a very high accuracy. 22:33It understands accents, background eyes, like uh, it 22:36does a really, really good job. 22:38And it's outcompeting the bigger labs, like 11 Labs or um, even OpenAI, 22:43voices and stuff like that. 22:44Super small. 22:45Apache 2.0 open sourced it. 22:47So you're seeing innovation from all over the world. 22:49This is not just a US versus. 22:51Uh, China, and at times you hear Mistrial from France, right? 22:54But this is a global moment. 22:56Right now. 22:56Everybody is investing heavily in India. 22:59You have a lot of labs that are now getting a lot of, uh, 23:02investment to go build these models and own your own destiny. 23:05So just the fact that the whole community, the global community is all in on 23:09open source and powering through it. 23:11That's, that's the way we should be. 23:13And, and 23:13Shobhit, just to add to that Sir Demis Hassabis, I mean, please 23:16note the word "Sir." I don't hear no American accent on him, buddy. 23:27That's actually a great pivot into our next segment. 23:29Uh, one thing I did really want to cover was this, uh, pretty interesting 23:32letter that came outta J.P. Morgan, Patrick Opet, who is their Chief 23:35Information Security Officer. 23:37Pens sort of an open letter to the industry that was kind of a call to 23:40action to work on SaaS security, which is a, a big problem and a known problem, 23:45but I thought was pretty interesting is that he focused specifically on, um, 23:49AI and its contribution to this issue. 23:51So I'll just quote what he said. 23:52He said critically. 23:54The explosive growth of new value bearing services and data management, automation, 23:58artificial intelligence, and AI agents amplifies and rapidly distributes 24:02these risks, bringing them directly to the forefront of every organization. 24:06And so kind of, there's a SaaS security issue and then AI's gonna 24:09basically like pour gasoline on it. 24:10And I guess, Chris, we both throw it to you as like, this seems pretty dire. 24:13Are we, are we in trouble? 24:16Uh, I don't know if there's anything new that, uh, is being said in this letter. 24:20Really. 24:21I mean, yes, agents of course. 24:23Um, but, uh, uh, we've had agents really, I mean, for, for decades. 24:28And in some ways, if you think about it, I mean the, uh, the autonomy, 24:31the, um, uh, the, the action taking is, is not particularly new. 24:36I mean, with AI agents these days, it's the interaction through natural language. 24:40Um. 24:41More data and, and this sort of stuff. 24:42But, um, I mean, there's been things, I mean, Shobhit mentioned 24:46the, the Z processors and stuff. 24:48I mean, transactions happen, things happen very quickly. 24:51Um, there's been, um, I mean, software as a service for more than a decade. 24:56Um, and, uh, like really, I mean, the point of the letter is yes, I 25:02mean, focus on security, of course. 25:03I mean, uh, like who's not gonna say yes. 25:07Um, maybe there's a culture sort of issue that's, uh, being pointed 25:11out that, uh, uh, maybe we need to think more, um, about these, uh, 25:16important consequential industries. 25:18Regulated industries and stuff at the forefront. 25:20Maybe, uh, like you said, Tim, I mean, the mixture of experts. 25:23We don't cover the, that in industry so much. 25:25Maybe we should, um, and other sort of industries like this. 25:28But, uh, yeah, I mean, like for people who do work in those industries, um, this 25:34is not a, a, like anything new really. 25:37So this 25:38is a good reminder for everybody, uh, that like you have to think about the 25:41governance as we move from experiments to going into production at scale. 25:46And I'll give you my take on a more of a, from an enterprise perspective, right? 25:50When I'm working with my clients and we are putting things into production, I. 25:53Across all of these different SaaS vendors, everybody else is 25:56in this rush to force agents into their SaaS, uh, platforms, right? 26:01The, like even industry standards like MCP, it took them a while. 26:05It like worked different versions to get to supporting 26:08authentication properly, right? 26:10This is so much in software engineering that has been done 26:14to secure things the right way. 26:16And we are almost like throwing that away and starting from scratch when you're 26:19getting to these agents and stuff, right? 26:21We have collectively decided as a humanity that English is the 26:25way to talk to these agents and it just does not scale quite yet. 26:29Right? 26:29If I'm trying to to call another agent and give it a task, I need more structure. 26:34'cause I need to do error catching. 26:36I need to be able to pass authentication for this particular task I'm giving 26:39you read access to this particular data set and whatnot, right? 26:43So we need to evolve beyond the cute demos that we have all done 26:46on stage last year and this year. 26:50As you get more and more, uh, uh, more serious about rolling these 26:54out at scale, the governance aspects of it, you have 10 different, uh, 26:59SaaS vendors for the marketing team that you're working with, right? 27:02The CMO is not spending that much time understanding what's happening 27:05with all of that data is everything. 27:07If, if I have to go put a small policy and a policy could be as simple as: 27:12I don't want generative AI to create any content that 27:16refers to these 10 competitors. 27:19If I give it that small a tos, such a basic thing to do, it's insane how 27:23much effort it'll take for all these enterprises to go make that happen 27:26in every 10, 20, 30 different SaaS vendors that they're working with. 27:30So we are struggling as, uh, when we are delivering these with enterprises. 27:33That's why all the work that Kush and team are doing around governance, 27:36around security, guardrails, things of that nature are so important. 27:39And this has to work across regardless of which AI model that you're 27:43bringing into the organization. 27:44So I think J.P. Morgan Chase is doing a really good job at giving 27:47a reality check to leadership. 27:49And what J.P. Morgan does is often imitated and copied and people 27:53in, uh, get inspired by the work that they've done in the space. 27:57We need more people to talk about governance and the reason why we need 28:00smaller models that you can monitor. 28:02You have the right guardrails, even speech models as you start 28:06to move from just text-based LLMS to now speech to speech native. 28:10I'm trying to roll something out at one of our regulated industry 28:14clients and it's very tricky. 28:15The speech models tracing, looking at auditability, the agent ops 28:21that's needed for speech models and stuff is insanely difficult. 28:24There's a lot that this community is gonna do. 28:27I would say if 2025 we say is the year of the agents, I would argue that 2025 28:32is the, is the year of governance. 28:34We got to get this right if we have a shot at going at production, at scale. 28:38Or, or we could build more agents to solve the problem. 28:44Not, you know, I, I get your point. 28:46We should go governance and I, and I, and I, I believe that's a serious thing. 28:49And we should build walls and put everything behind walls and then 28:52nobody can access in anything. 28:54Or, or we could go, Hmm, let's build more agents. 28:59And then the, the good agents can fight the bad agents. 29:03Yeah. 29:04And then we're gonna be fine because we're gonna have a little good 29:06agent versus bad agent more so I think any problem that is super hard 29:10today, we don't need to solve that 29:12with things like governance, we can solve that with more AI. 29:14That is, that is my solution to this. 29:16So Cisco, we recently had the big security conference this week. 29:20Cisco released a foundation model for security. 29:23IBM has done a lot in this space around security related models, right? 29:27So if you're looking at cybersecurity risks, you're looking at 29:29hallucinations, things of that nature. 29:31So I think there'll be enough AI improvements that we are doing. 29:34You need ai, good AI to fight bad AI a hundred percent with you on that. 29:37Um, we do need to talk about the discipline that enterprises need to have. 29:42To ensure that those good AI agents are deployed as default baked and security 29:46by design, not as an afterthought 29:48bolted on. 29:49Right? 29:49That's the point that I think J.P. Morgan Chase is arguing 29:52that get excited about this. 29:53Huge, huge benefit to us, but you have to make sure that there's a 29:57secure by design for the very big. 29:58But you can't cripple innovation at the same time. 30:01No, I get it. 30:02Right. 30:02There's certain areas where you have to say. 30:04You know what, um, I, this is a very serious thing and I need 30:08it to not hallucinate, blah, blah, blah, blah, blah, blah. 30:10But then at the same time, you need to make breakthroughs and 30:13you need to discover new things. 30:15And we have a hype cycle to maintain, right? 30:17So at the same time, we can't kind of hold back on that. 30:21So I'm, I, I get it. 30:22And I think for certain regulated industries, I understand that 30:25and that makes sense, but. 30:27Um, but at the same time, sometimes hallucinations are a good thing, right? 30:32Because it gives you a bit of creativity. 30:34So I'm, you know, we just, we just need to, we just need to be 30:37appropriate for the right scenario. 30:38Okay. 30:39Chris' point is actually a good one. 30:41Um, so, uh, when you have a mix of, uh, some like, uh, things that are controlling 30:47others, it doesn't always have to be just in a closed sort of system, like with a 30:51single governor, um, sort of thing, right? 30:54I mean. 30:55Our immune system, like it controls diseases and stuff, right? 30:58I mean, there's bad things happening, there's good guys fighting against it. 31:01Um, it happens in nature all the time in different ecosystems. 31:05So, um, if you take the, the big system level view of things, uh, 31:08control is not always just like one little knob and, and and stuff. 31:13So I think it's actually a, a mix of things and, um. 31:17Yeah, I just wanted to end with a shout out to the robust intelligence folks. 31:20So that's the, the team that put together the, the Cisco model 31:23that, uh, that Shobhit mentioned. 31:24So, uh, really good, really good work for, from, from them. 31:27Great. 31:27Well, that's resolved. 31:28The, resolved, the Shobhit, Chris, uh, debate conclusively. 31:32I thought my, I thought my cousin would be on my side. 31:37I'm on both sides. 31:38You're 31:38on both sides. 31:39More security, the 31:40better. 31:40Yeah, exactly. 31:42Chris likes both of you equally. 31:43It's fine. 31:50So I think to close our episode, as I mentioned at the top of the 31:53show, we're this is the first anniversary episode of MoE. 31:58Very fast year. 31:59Uh, we were able to bring together the original cast from episode one. 32:03And so a little bit like the kind of kickoff question we did, I thought 32:06it'd be fun to end with just like a final segment just talking about like, 32:10what we did on that first episode. 32:11'cause it's very fun to take a look back and be like, oh yeah, like, 32:14what, whatever happened to that? 32:15Or like, oh, that turned out to be a really big thing. 32:17So it's kind of just like a fun exercise. 32:19Um, producer Hans here will be playing some clips. 32:22You'll actually be able to hear yourself from a year ago, which 32:25may either be fun or cringeworthy. 32:26We're about to find out. 32:28Um, but I think the first topic that we covered on episode 32:30one was the Rabbit R1 device. 32:32Yeah. 32:33Which, if you recall, was a small, cute little hardware device with AI embedded, 32:37and it was a conversation about AI hardware and where it was gonna go. 32:40Hans, do you wanna roll that tape so we can have the respective 32:43takes of everybody on the show about what they said about that? 32:45But it, it's like trying to sell a pager to somebody today. 32:49It's like, here's this thing that's got the things you need. 32:52You can get messages and, you know, but nobody has a pager, right? 32:55Because it was replaced by the phone. 32:57And, and so I do think there will be AI on the hardware devices. 33:02I, I just don't get that one. 33:04Just being an optimistic of where the tech is going. 33:06I'm more on the wash of, uh, I see the promise of what this. 33:15And Apple takes a while to come into this industry, right? 33:18Same thing goes with the Vision Pro glasses, right? 33:21I, again, I was a big fan of them when I bought them early on and 33:26three days in I did return there. 33:28To me, what is this is leading to is actually like a fourth paradigm of 33:32how we interact with computing, right? 33:33I mean, there was punch cards, there was command line, then there was GUIs, and 33:38this is now, I mean like we're in this fourth sort of era, the language, natural 33:42language interactions and so forth. 33:45I think, I mean, yeah, I mean maybe there's no killer app yet, but the 33:49killer app maybe is the fact that we have this new way of interacting and 33:53that's what these devices are gonna, uh, start us, uh, on the road down. 33:57Nice. 33:58Uh, that's awesome. 33:59Well show, but I'll start with you 'cause I think you actually bought 34:01a Rabbit R1, um, where is it? 34:03Where is it now? 34:04It's in the garage. 34:05In a box. 34:07Okay. 34:07I gone and sell it on eBay, man. 34:09Oh, really? 34:10Can't even, there's no secondary market for the R1. 34:13Oh man. 34:13I'm hoping this will be, uh, one of those things that goes 34:15for a million bucks later. 34:18But yes, it's, uh, it's, it's in a garage in a box. 34:21I, I couldn't even find it for, to bring this, for this episode. 34:24But, uh, I think overall, I still stand by what I said. 34:27I think the, the market needs to evolve and we are not there yet. 34:30We've not seen a single device that is beyond, like even Ray-Bans. 34:34I obviously have the Ray-Ban glasses as well, but it's, it's okay, but 34:37not at the point where you can really use it as a, as real device. 34:41I think the last time we saw an. 34:43Accessory that could augment, uh, your, uh, your iPhones and stuff was the watch. 34:49Right? 34:49Watches. 34:50They looked at a niche, they went off of that and they're 34:53augmented extending your phone. 34:55They don't work without the phone. 34:57Right. 34:57It just works really, really well as a, as a partner. 34:59So I think we'll get to that point with, with devices, but. 35:02I've not seen another thing to throw 35:03my money at yet. 35:05All right. 35:05Real quick, Chris, do you wanna take a victory lap on this one? 35:08Because I think you won this point. 35:10Yeah, no, and I'm, I'm feeling good about that one. 35:13That, that, that thing is a pager. 35:14I said it at the time and I'll say it again, so I'm feeling good. 35:18All right. 35:18Sounds good. 35:19Second thing, uh, that we covered on episode one was the rise of a mysterious, 35:24uh, chatbot on Chatbot Arena called GPT-2 Chatbot, which if you recall, there's 35:30wild speculation about what it was, and. 35:32Um, I guess Hans, do you wanna play the clip about like your, 35:34your, your all's take at the time? 35:36Is it GPT-5? 35:37I don't know. 35:38I think they've hyped GPT-5 so much that if that is 35:42at this point, it has to be AGI or it's like not even gonna impress us. 35:46Exactly. 35:46So maybe it's GPT-4.5 but I, I don't think that I, I, I read a theory online. 35:51I can't say who's said it, but I actually like it. 35:53I, somebody said that, uh, take the GPT to, uh, LLM, which they've open source. 35:59You can download that in Hugging Face. 36:01And they reckon that they may have trained GPT two on the, uh, latest, 36:07uh, data that trains the GPT-4. 36:09And I think that's an interesting theory, right? 36:11You know, GPT-2 with GPT-4 data. 36:13So maybe it's something like that. 36:15Um, I don't know. 36:16Um, but I don't think it's GPT-5. 36:17It, it probably is GPT-4.5, and as you say, you've, you've gotta put 36:22it in some sort of arena to, to see how well it's actually performing. 36:26Chris, that was a pretty good guess. 36:27I mean, 'cause we're now living in a GPT-4.5 World, right? 36:30Yeah. 36:30I think, I think I got, I can't even remember what that model was. 36:33Was that like the when, when was this? 36:36Was that the 4.o, or was it slightly after? 36:37Yeah, so that was the 36:38next model that GPT released. 36:39And I think one of the, somebody had spilled the beans saying that, Hey, 36:43yes, that was, that we, it scored really high during the GPT-4o released. 36:47So our guess is that that was, uh, the testing they were doing on LLM arena. 36:52Yeah. I can't even remember. 36:53That's how far back it was, but I think, I think my guess was pretty good, right? 36:56It wasn't quite four or five, but it was basically the next version of the model. 36:59Yeah, I feel good. 37:00Yeah, I still think the GPT-2 theory was a great one though, so, you know, so, 37:04so somebody's gotta do that actually. 37:07That's a good idea. 37:09All right. 37:09And so for the final one that I want to play, we, we talked about 37:12agents, which has since become basically like an ongoing MoE 37:16in joke, I suppose. 37:18Um, and, uh, I think with Shobhit and Chris, on the show, both 37:22of you are probably our most prolific users of the word agents. 37:25I feel it. 37:26It, uh, if we just did a word count of all the things you've said on 37:29MoE, probably you two would be at the top of that leaderboard. 37:32So Kush, I'm gonna ask you to maybe make a guess, which is, 37:36which one of these guys used 37:38agent first on MoE, um, like who is number one on, on breaking that seal, I guess. 37:45Yeah, this, this time I'll go with the family member. 37:48Woo. 37:49The was plan. 37:50All right, well roll the tape, Hans. 37:53The talk by Andrew on, uh, how agentic flows 37:57are going to be the way we get to the AGI. 37:59Yes! 37:59Nailed it, congratulations. 38:00So show didn't blame for starting that 38:02I heard agentic. 38:03Yeah. 38:04I didn't the word, I didn't hear the word agents at all. 38:06I, I call, uh, this is, uh, a fix. 38:10Yeah, yeah. 38:10Disqualified. 38:11So we'll do some investigation on who actually used the word 38:14agent first, but I guess Shobhit 38:15you can, uh, be, uh, you can rest easy knowing that you were really 38:19a, a pathblazer there for us. 38:21But I still, but you should take that 38:23a second to acknowledge how far we have come in the last year. 38:26At that point, uh, Andrew had just introduced how a 3.5, uh, can, can 38:32GPT-3.5 with tools can actually outcompete GPT-4 and stuff, right? 38:36So just imagine how far we have come in terms of the cost, the kind of 38:39tools, the ecosystem around agents. 38:42I'm just very proud of where the community is today with the, with our, what's 38:45happening in the multi-agent space. 38:47Yeah, for sure. 38:48And more to come soon. 38:48I mean, I think we're gonna have to do this next year as well 38:50where we look back on what we were talking about for, uh, in 2025. 38:53Ooh, we should have done, uh, what should be the next word? 38:56That'll catch it. 38:57That'll go viral, right? 38:59You may have used it on this episode. 39:01Open letter. 39:01You know, 39:03Congratulations to the production crew for the one year anniversary. 39:06I wanna give a big shout out to the producer Hans, Alex, Michael and Selma, 39:11you guys have poured your soul into this. 39:13Thank you so much for bringing the Mixture of Experts to our audiences. 39:16Thank you so much! 39:17Happy birthday. 39:19Happy birthday. 39:21Well, that's all the time that we have for today. 39:23Uh, Kush, Shobhit, Chris, an amazing panel. 39:26Glad to have you on again! 39:28Um, and thanks to joining us all you listeners, if you enjoyed what you heard 39:32you can get us on Apple Podcasts, Spotify, and podcast platforms 39:35everywhere, and we will see you all next week on Mixture of Experts.