Learning Library

← Back to Library

Year-End AI Model Launches

Key Points

  • Mistral 3 is a straightforward dense‑attention transformer without exotic attention tricks, yet it delivers strong performance, showing that scaling plain‑vanilla models can still be effective.
  • At Amazon’s Re:Invent conference the company launched three autonomous AI agents capable of handling coding, security, and operations tasks for extended periods without human intervention.
  • IBM reported its AI coding assistant “Bob” boosted developer productivity by 45%, and Salesforce data revealed AI‑driven agents generated roughly $14.2 billion in Black Friday sales.
  • This week saw a rapid succession of major model releases—Claude Opus 4.5, Mistral 3, and DeepSeek 3.2—highlighting intensified competition among AI labs as the year winds down.
  • IBM staff nickname the end‑of‑year slowdown “funsember,” using the quieter period to experiment with and evaluate new AI technologies.

Sections

Full Transcript

# Year-End AI Model Launches **Source:** [https://www.youtube.com/watch?v=_lZgapJzFho](https://www.youtube.com/watch?v=_lZgapJzFho) **Duration:** 00:35:39 ## Summary - Mistral 3 is a straightforward dense‑attention transformer without exotic attention tricks, yet it delivers strong performance, showing that scaling plain‑vanilla models can still be effective. - At Amazon’s Re:Invent conference the company launched three autonomous AI agents capable of handling coding, security, and operations tasks for extended periods without human intervention. - IBM reported its AI coding assistant “Bob” boosted developer productivity by 45%, and Salesforce data revealed AI‑driven agents generated roughly $14.2 billion in Black Friday sales. - This week saw a rapid succession of major model releases—Claude Opus 4.5, Mistral 3, and DeepSeek 3.2—highlighting intensified competition among AI labs as the year winds down. - IBM staff nickname the end‑of‑year slowdown “funsember,” using the quieter period to experiment with and evaluate new AI technologies. ## Sections - [00:00:00](https://www.youtube.com/watch?v=_lZgapJzFho&t=0s) **Mistral Model Review on MoE** - The panel examines Mistral's plain‑vanilla transformer design, then previews the week’s AI headlines—including Amazon’s new autonomous agents and IBM’s coding assistant—on the Mixture of Experts show. - [00:03:11](https://www.youtube.com/watch?v=_lZgapJzFho&t=191s) **Exploring Emerging Efficient AI Models** - The speaker highlights the surge of new AI models and observes how labs such as DeepSeek and Mistral are advancing efficiency with novel attention mechanisms, despite the challenge of fine‑tuning amid the abundant options. - [00:06:40](https://www.youtube.com/watch?v=_lZgapJzFho&t=400s) **Open‑Source Labs vs Commoditized AI** - The speakers debate whether open‑source AI projects can stay distinct amid inevitable model commoditization, noting that openness itself and strategic focus—rather than massive compute—allow them to achieve state‑of‑the‑art performance. - [00:10:20](https://www.youtube.com/watch?v=_lZgapJzFho&t=620s) **Model Ensemble Optimization Strategy** - The speakers discuss using routers to dynamically select among multiple AI models—like Opus, Claude, and Mistral—to optimize performance across varied enterprise and vision tasks. - [00:14:02](https://www.youtube.com/watch?v=_lZgapJzFho&t=842s) **Using AI as Collaborative Code Partner** - The speaker describes treating AI models like a responsive, corrective sounding board for technical tasks, valuing their ability to refine suggestions and collaborate rather than engage in casual conversation. - [00:18:31](https://www.youtube.com/watch?v=_lZgapJzFho&t=1111s) **Scaling Laws and Compute Access** - The participants debate whether AI scaling laws remain applicable for inference, arguing that only entities with massive compute resources can leverage them while others encounter diminishing returns. - [00:23:21](https://www.youtube.com/watch?v=_lZgapJzFho&t=1401s) **Beyond Scaling: Quality‑Driven AI Progress** - The speaker argues that recent AI advances arise mainly from training and algorithmic refinements—a “quality improvement law”—rather than larger model sizes, emphasizing how costly, months‑long iteration cycles make pure scaling impractical. - [00:26:31](https://www.youtube.com/watch?v=_lZgapJzFho&t=1591s) **Amazon Blocks ChatGPT Shopping Agent** - Amazon announced it will prevent ChatGPT’s shopping‑research feature from accessing its product listings and pricing, limiting the AI’s ability to browse and recommend items on the platform. - [00:29:41](https://www.youtube.com/watch?v=_lZgapJzFho&t=1781s) **Monetization Threatens Open AI Agents** - The speakers warn that emerging paywalls, such as a proposed Cloudflare “toll booth,” could restrict free web access and jeopardize the practical development of AI agents by shifting the focus from open experimentation to revenue-driven incentives. - [00:33:16](https://www.youtube.com/watch?v=_lZgapJzFho&t=1996s) **Personal AI Agents Replace Apps** - The speaker outlines how conventional apps are being supplanted by personal AI agents that delegate tasks to approved third‑party agents, highlighting the platform‑versus‑AI competition and the emerging shift of SEO toward AI‑driven assistance. ## Full Transcript
0:01On the Mistral side, uh, you know, I was 0:03actually kind of surprised, but I had 0:06mixed feelings about it's kind of a 0:08plain vanilla transformer, right? Like 0:10there's a few little tweaks in there, 0:12but there's no fancy attention 0:13mechanisms. There's no attempt at linear 0:16attention scaling. It's just a a big old 0:19dense attention model. Um, [music] and 0:21it's really good. All that and more on 0:23today's Mixture of Experts. [snorts] 0:30I'm Tim Huang and welcome to Mixture of 0:32Experts. Each week, Moe brings together 0:34a panel of the smartest minds in 0:36technology to distill down what's 0:37important in the latest news in 0:39artificial intelligence. Joining us 0:41today are three incredible panelists. 0:42We've got Aaron Botman, IBM fellow 0:44master inventor, Abraham Daniels, senior 0:46technical product manager for Granite, 0:48and Gabe Goodart, chief architect, AI 0:51open innovation. Uh, welcome back to the 0:52show [music] all three of you. Lots to 0:54talk about today. We've got a welter of 0:57new model releases. We'll talk a little 0:58bit about the future of the scaling laws 1:00and uh dust up that's happening between 1:03Amazon and Chat GBT. But first, we've 1:05got Eiley with the news. 1:08[music] 1:11Hey everyone, I'm Eiley McConn, a tech 1:13news writer for IBM Think. [music] Here 1:15are this week's AI headlines. At 1:17Amazon's [snorts] 1:18annual reinvent conference, the tech 1:20giant [music] launched three new agents 1:22that can handle coding, security, and 1:24operations independently for hours or 1:26days at a time. IBM has shared early 1:29results that its AI coding assistant 1:32named Bob has helped IBM developers 1:34improve their productivity by 45%. 1:38The [snorts] numbers are in. Globally, 1:40AI and agents influenced 14.2 2 billion 1:44in sales on Black Friday, according to 1:46software firm Salesforce. AI for dinner. 1:50Posasha is a new private robot chef that 1:53can prepare complex multi-step dishes. 1:55[music] 1:56Hello, Smart Kitchen. [snorts] For more, 1:59subscribe to the Think newsletter linked 2:00in the show notes. [music] And now back 2:02to the episode. 2:07I was looking at the calendar and 2:08there's really not a whole many few days 2:10before the end of the year. Uh but 2:12basically the AI news hits keep coming. 2:15Um and literally in the last few weeks 2:16we have had not one not two but three 2:19fairly major model launches um by uh 2:23competing labs in the space. So uh 2:25Claude Opus 4.5 is out, MRL 3 is out and 2:29also Deepseek 3.2 is out. Um and so Gabe 2:32I want to kick it over to you first. 2:34Obviously with Mistl 3 and Deepseek 3.2 2:36to open as a big theme of this round of 2:39launches and so kind of want you to just 2:41talk a little bit about what you're 2:42seeing with Mistl 3 and DeepC 3.2 um and 2:45I guess what listeners should take away 2:46from it. 2:47>> Yeah. Uh awesome. So, as you said, it's 2:50the end of the year. Everyone's cramming 2:51to have that last bit of [laughter] news 2:52for people to go. 2:54>> Uh well, I don't know about you guys, 2:56but I've always referred to it as 2:57funsember here at at IBM because it's 2:59when things start to slow down and you 3:01actually get some time to play with 3:02stuff. It's actually a great time to 3:03launch an experimental new model for 3:06people to then play with over the 3:07holidays and a little bit of downtime. 3:09So, um I think it's it's, you know, it's 3:11kind of fun and clever. People are going 3:12to have lots of new toys uh on their 3:15either laptops or servers depending on 3:16which size of model you pick and choose. 3:18So, um, you know, when I was playing 3:20with these, um, I think in some ways the 3:24fact that they're all hitting thick and 3:26fast right now, um, speaks to the fact 3:28that we are really hitting this just 3:30wealth of riches in the model space. 3:32Like, they're all such great models. Um, 3:35you know, same with the ones we talked 3:37about the last time we were on 3:38[laughter] talking Gemini. Yeah, there 3:41are so many good models out there, which 3:42is awesome. And it makes it a little 3:44harder to have sort of a fine-tuned 3:46point on any of these. But what I 3:48definitely noticed that I thought was 3:50interesting here um is that I think 3:53these respective labs at least on the 3:55open side are really leaning into their 3:57strengths. So I noticed that you know 3:59with the Deep Seek uh release they have 4:01yet another novel attention mechanism 4:04that aims to further uh dig into and you 4:09know improve on I've got a giant model 4:12but I can still run it really 4:13efficiently. Right? That was one of the 4:14major breakthroughs with uh you know R1 4:16and the V3 series. Um and you're seeing 4:19them pushing yet another one here with 4:20the their sparse attention uh mechanism 4:23here with uh 3.2. So, um, really cool to 4:27see that they're continuing to iterate 4:29in that space. Um, and, um, on the Mistl 4:34side, uh, you know, I was actually kind 4:36of surprised, but I had mixed feelings 4:39about it's kind of a plain vanilla 4:41transformer, right? Like there's a few 4:43little tweaks in there, but there's no 4:45fancy attention mechanisms. There's no 4:47attempt at linear attention scaling. 4:50It's just a a big old dense attention 4:52model. Um, and it's really good. Uh I 4:55think what their innovation was is that 4:57every one of their models up and down 4:58the line had vision capabilities which 5:00is pretty cool to just see that not as 5:02like an extra but just as breadandbut. 5:05Um these are multimodal models out of 5:07the box. They work great on text but and 5:08the vision is just more rather than a 5:11lot of times you think about especially 5:12at the smaller end of the scale you get 5:14models where you have to kind of make a 5:15trade-off between quality in 5:17multimodality and and like pure quality 5:19in text. And it seems like they've 5:21figured out a pretty clever way to 5:23actually just boost it really well. So, 5:25you know, I did a little dabbling with 5:26all of these yesterday. I tried to 5:28experiment with a pretty hard uh coding 5:31problem I'm trying to tackle right now 5:33around metal kernel optimization for 5:35Llama CPP, which frankly I don't know 5:36much about. [laughter] And I threw it at 5:38>> That's very specific. 5:39>> Yeah, it is very specific. I threw it at 5:41all three of the big size models. Um, 5:44and you know, each one of them gave me 5:45some really, you know, interesting tips 5:48uh on optimizations. So again, like I I 5:51don't have any clear way of saying, "Ah, 5:52this one blew me away and this one, you 5:54know, really fell flat." They all did 5:56great. They're all great models at the 5:58top size. What I did think was really 6:00cool is that I was able to pull mineral 6:013B on my, you know, my dev box and just 6:05crank away on multimodality workflows 6:07through open web UI, and that was really 6:09fun, too. So, um, I love seeing both 6:12ends of the spectrum here, especially 6:13from the Mistral release. you know, I I 6:16kind of wish I could get my hands more 6:18tightly around the Deep Seek release so 6:19I could play with it more. Um, but uh, 6:23you know, I'm I'm a love to tinker with 6:25it myself kind of person and uh, at the 6:27top end of the scale, they're still all 6:29delivering great quality. So, 6:30>> yeah, for sure. And Abraham, maybe I'll 6:32kick it over to you. I mean, I want to 6:33build on that comment that Gabe had, 6:35which is uh, there's so many models. 6:38They're all great. Uh, and as a result, 6:40it's almost hard to make distinctions 6:42between them because it's it's all 6:43really good basically. Um, and there's a 6:46there's a a famous kind of memo from the 6:48early days of this competition, and by 6:50that I mean, I don't know, a few years 6:51ago that leaked out of Google that said, 6:54look, at the end of the day, there is no 6:55moat to these models cuz everything's 6:57going to be commodified. Everything's 6:59kind of going to be great and all free 7:01and available in a certain sense. Do you 7:03have a sense of how open- source 7:05projects, labs need to be able to stay 7:07differentiated in this market or is this 7:09just kind of inevitable that like 7:10everything will be great and then 7:12everybody will kind of be very similar 7:14in the models that they launch? 7:15>> It's a good question. So I think open 7:16source labs differentiate from closed 7:18source by you know being open source to 7:20be honest. I think that really is the 7:22remote there. um in terms of uh like you 7:24know performance and capabilities I 7:26think with the first Deep Seek R1 you 7:28really saw that you don't necessarily 7:30need to be one of these you know highly 7:33funded uh you know hundreds of thousands 7:35of GPUs trained um you know 7:38organizations to be able to to to really 7:40release something that's 7:41state-of-the-art. Um, I think what 7:44you're starting to what I saw with Deep 7:45Seek and with Claude is kind of leaning 7:47into what you do really well kind of as 7:48Gabe mentioned, um, in terms of, you 7:51know, Claude Opus really doubling down 7:53on, you know, software engineering as 7:55their, you know, primary target and and 7:58really ensuring that they maintain kind 8:00of like a a chokeold on on that 8:02particular use case or those particular 8:04capabilities. Uh what I saw what I kind 8:06of thought was cool for Deepseek was um 8:09their kind of reasoning first for agents 8:11for tool calling. Um so really kind of I 8:15think you're starting to see a lot of 8:17model releases uh really focus on being 8:22hyper performant when it comes to tool 8:24calling or any sort of agentic workflows 8:26as they start to see like that's the 8:28next frontier in terms of how LLMs are 8:29going to be used in place of standalone. 8:31Um I think what I really liked about 8:34this like you know last number of model 8:36releases is Vistil is going back into 8:38open source and not with their you know 8:40bespoke research licenses really going 8:43back to the Apache 2.0 roots of the 8:45early open model kind of efforts. Um, 8:49but yeah, so in terms of differentiating 8:51between open source and closed source, I 8:53think, you know, right now they're 8:54they're they're kind of neck and neck in 8:56most cases. And I think that, you know, 8:58there's a there's a there's a comment 9:01about like, you know, what is good 9:03enough in terms of what's out there. And 9:05I think you, you know, it really depends 9:07on um, you know, what your business case 9:09is and what you're actually trying to 9:10solve. Abe, I think you put an 9:12interesting point on that one actually 9:13with the Deepseek reasoning and tool 9:14calling is that I I really think each 9:17lab is going to choose their specialty, 9:19right? Like Deepseek is clearly trying 9:20to be the reasoning lab. So they're 9:22going to try and make their reasoning 9:23the best no matter what else you're 9:25using it for. Um, you know, uh, 9:28Anthropic is clearly trying to be the 9:30best developer model shop. Uh, and 9:32they're really going to lean in heavily 9:34on that one. You know, I so I think that 9:36is what you'll probably end up seeing is 9:37that individual labs start to 9:39differentiate by the 9:42I don't want to even call it task 9:43because task is hard to classify in an 9:45LLM space. Like you could call tool 9:47calling a task, but it's going to be 9:48much more around like domain. Like what 9:50domain is the model really optimized to 9:52work well in? Um and of course you're 9:55still going to have the frontier labs 9:56that are trying to just be the best at 9:58all domains, right? Um, and you can do 10:00that by just pushing the parameter count 10:02up. Uh, and the the training data 10:04improvements. We'll get to that in the 10:05next segment, obviously. Uh, but I 10:08think, you know, especially at the 10:10smaller model sizes and the open source 10:12shops, you're going to probably see the 10:14ones that succeed um, choosing their 10:16specialties and trying to fight a niche 10:18market in a specific domain. 10:19>> Yeah, I think that differentiation is 10:20going to be really interesting to see 10:21because it's like almost in this game of 10:23musical chairs, it's like how many 10:24chairs are there for for you to 10:26specialize in this space? Um Aaron, I 10:29want to bring you in. Um you know, 10:30obviously with Gabe and Abe on the on 10:32the line, uh they're very uh they're 10:34very open open biased, but do you want 10:37to talk a little bit? Have you played 10:38with Opus 4.5? Curious about your 10:39impressions on on Claude and what 10:41Anthropic is doing with the new model? 10:43>> Yeah, I mean just just to just to sort 10:44of piggyback off, you know, your comment 10:46about the musical chairs. I mean, the 10:48good news is is that m the music never 10:50stops here, you know, because we have 10:52lots of models coming in very 10:53performant, right? And what I what I 10:55what I personally like to do is to 10:57ensemble different models together, you 10:59know, because because it's almost like 11:00this optimization problem, right? Where 11:02where you have lots of different models, 11:04right? And you need to um optimize 11:07against an objective function that 11:08you're trying to achieve, right? But 11:10that object objective function also is 11:13baked in there. What models do you 11:14select? But you can have a router that 11:16in turn looks at the ideal use cases for 11:18each of these models, you know, such as 11:20the Mistral 3. it seemed to be you know 11:22you know really good at these enterprise 11:24rag systems right and uh wanting to um 11:28have these different chat bots or 11:29co-pilots and um some some of it for a 11:32vision whereas if you look at DeepS you 11:34know 3.2 it's very great you know at 11:37math and and and different types of you 11:39know codegen uh but I also noticed that 11:41the opus uh 4.5 um it seemed to extend 11:45this notion of a digital worker right 11:47where you could maybe even replace or 11:49augment you know humans uh where you 11:52know you will actually have an engineer 11:54you know a virtual engineer that would 11:55read the entire 200page spec of a 11:58language which we generally wouldn't do 12:00right so I mean that that's that's quite 12:03quite helpful there you know um and and 12:06then um as as you begin to create a 12:09router that then in turn uh pushes out 12:11you know your uh prompts right um and 12:14then it pulls in the data which each of 12:16the models have have access to you then 12:18in turn have to consider the different 12:20topologies and I uh Gabe was a was a 12:22mentioning right whether it's going to 12:24be you know Mistl's um open weight 12:26mixture of experts with these different 12:27types of attention transformers that 12:30they have um deepse sparse attention 12:34um and then um then then collage where 12:36where it has you know this agentic 12:38enhancements and memory state layers but 12:40but what I think is going to happen 12:42right is with you know as these models 12:45come in we're going to have more hybrid 12:47architectures right so I think the day 12:49of just having a transformer is going to 12:51be no longer where you're going to see 12:52statebased models being put in and mixed 12:55together. Uh which which is going to 12:57make it very exciting, right? Um just 12:59just that the different emergent 13:00behaviors that are going to happen 13:02especially when socially the these 13:04models began to interact and you know 13:06play together. 13:07>> Maybe a final question here and then 13:08I'll move us on to our next topic. Um 13:10you know at least on the topic of Opus 13:124.5 which is the one out of these three 13:14models I've played the most with. um 13:17they're really getting quite good at 13:18personality it feels like. Uh my friend 13:20had a really funny incident where he was 13:22having a conversation with Opus 4.5 and 13:25then you know they're talking about one 13:26topic and they moved on to another topic 13:28and then like many turns later Opus says 13:30like oh you know I've been thinking 13:31about that thing that we're talking 13:32about earlier and I think you're right 13:34on that issue. It was like kind of this 13:35like very weird moment where it's like 13:37the model kind of does this call back in 13:38a very kind of like natural way that you 13:40might have you know in having a 13:42conversation with someone. Um, I'm kind 13:44of curious if you any of you have kind 13:45of like uh your sort of like vibe and 13:48flavor check across these models either 13:50with the 4.5 or otherwise. Um, it does 13:52feel like the 4.5 kind of nailed 13:54something with the voice. Um, but uh 13:57curious about what you think about uh 13:58that if any of you have kind of played 13:59with it and what do you think? 14:00>> It's really interesting to hear you say 14:02that cuz like frankly that's exactly the 14:04opposite of how I use models, right? 14:05Like I I don't think I've ever gone four 14:08or five. It's exactly I I use them as a 14:11functional sounding board. the way I 14:13might have a colleague where I used to 14:15stand up and walk over to their desk and 14:17be like, "Hey, let me vomit words at 14:18you." And then like you'll you'll ping a 14:20few back at me and all of a sudden like 14:22the the right answer will emerge. 14:24>> Be like, "Okay, thank you. Goodbye. 14:25>> Thank [laughter] you. Goodbye." Exactly. 14:27Uh like wipe your wipe your memory, 14:29start over again. Um no. Uh so my vibe 14:33check is much more about sort of the 14:35responsiveness and the collaboriveness 14:36in those functional experiences. Um, you 14:39know, I models that are open to when I 14:43point out something that they got wrong, 14:45refining their suggestion and or taking 14:47that like that is my vibe check right 14:49now because a lot of I use them almost 14:51exclusively for technical related topics 14:54whether it's checking my own code, 14:56helping to understand a programming 14:58language or a accelerator paradigm that 15:01I'm not familiar with. Um, but I do have 15:03deep expertise in some of the the 15:05aspects around what I'm doing. And so a 15:07lot of times I'll catch something that 15:08they did wrong, but they'll have some 15:09insight that I don't. And the ability to 15:11collaborate back and forth is what 15:12really like resonates as quality for me. 15:15Um, I'm not so much in the chatty mode 15:18usually. Um, but yeah, I think everyone 15:20probably has their own vibe check 15:22depending on how they like to use 15:23models. 15:24>> Yeah, totally. That goes to the chair's 15:25question, right? Is like these use cases 15:26are really so varied that like that's 15:28going to be the specialization. So very 15:30interesting. 15:33>> [music] 15:35>> I'm going to move us on to our next 15:36topic which is related to some of the 15:38stuff that we're talking about here. Um, 15:40one of the folks connected to flagged 15:42this interesting blog post uh from a VC 15:44for from theory ventures uh called Tomas 15:48Tongas. Um, and he kind of pulls 15:50together a couple different threads, but 15:51sort of the core of his article, uh, 15:53this blog post that he did was sort of 15:56arguing that Gemini 3, uh, kind of 15:59demonstrates that maybe the scaling laws 16:01are still pretty good, like that 16:02essentially with like a ton of compute 16:05using the methods we know, we can still 16:07see some like major capability 16:09improvements. And, you know, I know last 16:11week we we did talk a little bit about 16:12Gemini 3, but it's sort of interesting 16:14kind of putting it in that context. We 16:16talked a little bit about sort of what 16:17you can use Gemini 3 for. This is maybe 16:19a little bit of like how it is maybe 16:21informative about how the meta 16:23competition around these models is 16:25evolving. Abraham, maybe I'll kick it 16:26over to you. Is is do you buy the 16:28thesis? Like scaling laws maybe still 16:30better off than we thought. Um we just 16:32got to throw more compute at it. 16:34[laughter] 16:34>> Yeah, I'm sure Nvidia would love that. 16:36Um 16:38so I I think that that Google's a little 16:41bit different in this case. Um 16:43specifically because they've got full 16:45integration of hardware and software as 16:47it uses its TPUs, 16:48>> right? They have like the most computers 16:50that ever computered kind of. 16:52>> Exactly. You know, so it's much 16:53different than a uh a shop using GPUs 16:56that might not have the same integration 16:59um across the whole stack. So I I think 17:02the you know TPUs and Google have a 17:04little bit of unparalleled advantage 17:06when it comes to being able to squeeze 17:08as much out of the com you know 17:09processing units. So I I I think if we 17:12were to see this type of you know 17:14behavior if you will with a different 17:16model um for instance you know if if the 17:19new claude opus 4.5 or DC were to be 17:22able to showcase you know some of the 17:25you know contradictions to the scaling 17:27laws I think we just need a different um 17:29or I guess another proof point if you 17:31will. Uh and in in reading some of the 17:33material from the Gemini, they did some 17:35other things or at least you know the 17:38they alluded to some other concepts or 17:40or updates in their strategy in terms of 17:42how they build their model that could 17:43lean into it. I I read a comment about 17:46you know context engineering in place of 17:47prompt engineering where you know the 17:50thought process was maybe behind the 17:52model generation it grabs a bunch of 17:54large relevant context in the background 17:56so it can you know provide a little bit 17:57more of a thought experiment before it 17:59generates the results. Um, so yeah, I 18:02guess to to kind of close the button on 18:03on Gemini, I'd love to see maybe a diff 18:06another proof point that is a little bit 18:08more focused on GPU use as opposed to 18:10TPU. And then in terms of the other part 18:12of that um article that said, you know, 18:16subsequent to Google's uh comment, you 18:19know, Nvidia's growth is still, you 18:21know, their last Q3 release showed that 18:23they're still 18:25uh have a massive output of GPU. Most 18:28GPU sales aren't for pre-training. 18:30They're for inference. So, I don't 18:31necessarily think that's like the right. 18:33>> Yeah, that one I felt was like a little 18:34bit of a weak argument is basically like 18:37just buying compute doesn't mean that 18:38the scaling law still applies. 18:40>> That's that's exactly it. 18:41>> Yeah, for sure. Um, William Gibson has 18:44this famous quote which is like the 18:45future is here, it's just not widely 18:47distributed. And I guess this kind of 18:48makes me think about like the scaling 18:50laws are here. this just not widely 18:51distributed in the sense that basically 18:54and Aaron I'm curious about your comment 18:55on this is like we may live in a future 18:57where sure I guess maybe the scaling 18:59laws kind of still exist but the amount 19:01of infrastructure you need to pull it 19:03off is basically only available to who 19:06maybe only Google like it's just like 19:08the scale is just so huge for everyone 19:11else they're kind of will will not be a 19:13scaling law do you think that's kind of 19:15the case is that basically like it will 19:17because we're sitting this kind of 19:18period of diminishing returns you can 19:20only keep it going if you're like at 19:21like the 99.999th 19:24percentile of access to compute and for 19:26everybody else we're just we're just in 19:28a world where scaling laws kind of don't 19:30exist anymore. 19:30>> Yeah. I mean I mean I think that the 19:33scaling curve you know it's going to be 19:34like this little stepwise you know 19:36function you know so you have all these 19:38little scurves that happen you know as 19:39we get new technology breakthroughs 19:42right so it's multi-dimensional right um 19:44and I think there's going to be new 19:46dimensions as we progress through a time 19:48right so like as new topologies come or 19:50new algorithms are designed I think 19:52that's going to help the smaller players 19:53to be more competitive you know you 19:55don't you potentially don't need these 19:58you know huge data centers with you 20:00these large amounts of GPUs 20:03uh because they might be able to improve 20:05performance without needing to change 20:07you know you know any of that. Um but 20:10but I also wanted to to make the point 20:12too is that um you you one of my litmus 20:15tests around the scaling law is I look 20:16at you know us humans right and you know 20:19you know we have this biological scaling 20:21law you know our brains they haven't 20:22changed much you know over the course of 20:24centuries right and but our tools and 20:26our technology have so the more data and 20:29knowledge um you know that we get we 20:31still have that same topology right uh 20:34we're just able to specialize and to 20:36have different types of training right 20:38which is less 20:39um um consuming, right? Um I mean I mean 20:42I know it's not a perfect analogy 20:44because you know AI gets better maybe 20:45with the more GPUs you add but humans 20:47get worse with the more coffee you add, 20:49right? But [laughter] 20:50but I mean even even so it kind of 20:52helps. 20:52>> Yeah. Totally. Yeah. I think I mean the 20:54the biological metaphor is good too 20:56because it's like it's also after all 20:58this evolution we don't have infinitely 21:00large brains, right? And so there's 21:02almost kind of a view which is yeah we 21:04actually found an equilibrium like 21:05actually you only need so much 21:06intelligence to get through most of the 21:08problems you're going to confront. Yeah, 21:10it's actually I think it's a really 21:11interesting metaphor, 21:12>> right? And and I mean I mean if you look 21:13at Gemini 3, you know, it kept roughly 21:16the same number of parameters, you know, 21:17you roughly one trillion, you know, with 21:19respect to Gemini 1.5, right? So it's 21:21almost like the human brain where it's, 21:23you know, the same sort of topology. I 21:25mean, I'm sure that they've, you know, 21:26changed different, you know, different 21:28types of activation functions and so on, 21:30you know, but size-wise, right? It's 21:32it's somewhat static, right? So, so it 21:35it is a bit a bit interesting, but keep 21:37in mind that step-wise function that's 21:39going to, you know, happen much like 21:40what's happened, you know, with compute, 21:42you know, with um quantum, right? Um so, 21:46uh I I do think that, you know, circling 21:48back to your original question that the 21:49small players will still, you know, you 21:51know, have a big say. Gabe, I think that 21:53there's a final question here which is 21:55the obvious comparison to these scaling 21:56laws is is Moors law, right, which used 21:58to be this industry organizing law that 22:00said processing power is just going to 22:02increase. And uh you know I think the 22:04right observation about Moors law is not 22:06like it's some you know mystical law of 22:09nature. It's because the whole industry 22:11was like we got to keep Mors law going 22:13and like it was only kept going because 22:15every few months we were able to get 22:17another innovation to keep it going 22:18again. Um, and so, you know, there's a 22:20way of looking at the scaling laws, 22:21which is it's not necessarily again some 22:24like inevitable law of nature like 22:26gravity or something like that. It's 22:28more of kind of like a shelling point 22:29that causes the industry to kind of 22:31focus on certain types of things. And 22:33so, yeah, curious about what you what 22:35you make of that. Is that the right 22:36interpretation of what we're seeing 22:37here? Yeah, I think it is a a very 22:40interesting, you know, it's kind of 22:42progress for the sake of progress, but 22:44there's probably some genuine utility 22:45coming out of that progress, but it's 22:47definitely sort of the competition is 22:49feeding the progress more so than the 22:51actual need for the progress. Um, but 22:54you know, Moors law is an interesting 22:56one because Moors law is actually like 22:58pretty explicitly measuring like 23:00floatingoint operations per second, 23:02right? like it's it's a very specific 23:04linear well uh unid-dimensional metric, 23:08right? Um and there are all sorts of 23:10clever ways to you know improve that by 23:13not just making one faster chip. You can 23:15do a bunch of things in parallel and do 23:17some clever metric, you know, whatever. 23:18The thing about this hypothetical 23:21scaling law in AI is it just even 23:24framing it as a scaling law to me seems 23:26like the wrong point. Like Erin, you 23:28pointed out that Gemini didn't change 23:31the number of parameters. So, we're not 23:32scaling size. 23:34They don't really tell us whether they 23:36scaled the data inputs, but they 23:38probably scaled to more data. 23:41Do you consider algorithmic improvements 23:44scaling? Maybe. Like I my guess based on 23:47that, you know, tiny tweet is that most 23:51of the improvements were actually how 23:53they trained, not, you know, just 23:56bigger, right? And I so I think the the 23:58framing of this as a scaling law is kind 24:01of a bit of a misnomer. Um I think it's 24:04a quality improvement law. And in some 24:06ways it's kind of a no-brainer that we 24:10are nowhere close to the wall on that 24:12because when you have an iteration cycle 24:14that costs millions of dollars and takes 24:16months. Yeah. It's going to be really 24:19hard to you know actually move that 24:21ship, right? Like, you know, as a 24:23developer, I want something that takes, 24:25you know, fractions of a second and say, 24:26"Oh, that didn't work. Try something 24:28else. That didn't work. Try something 24:29else." As a model developer, you have to 24:31press go and wait for months and burn 24:35millions of dollars in the process, 24:37right? So, those experiments are 24:39expensive. Iterating in the algorithmic 24:41space is really hard for these models. 24:44So in some ways it's not at all 24:45surprising that there's a lot of 24:47probably still lowhanging fruit to be to 24:49be grabbed by doing better things in 24:53that training space. Curating your data 24:55better, figuring out your actual 24:56training loops, figuring out your 24:58mixture of synthetic data, uh all of the 25:01above, right? There's there's so many 25:03tricks you can play to actually steer 25:05the training process of these models. 25:07And you know, I I would bet that if we 25:10were to somehow, you know, do some back 25:12of the napkin math on how many different 25:14theoretical poss out of the theoretical 25:16possible ways you could tweak and tune 25:18this hyperparameter space for training 25:20is, I bet you we've explored a tiny 25:21fraction of the hyperparameter space for 25:24training large language models just 25:26purely based on the speed at which they 25:28can they can operate. So I think the one 25:30thing that is interesting in this 25:32scaling law discussion is that as 25:34hardware actually gets faster at doing 25:37this, the ability to experiment and try 25:40new algorithms gets greater. And so that 25:43may actually be the real point of 25:45scaling where uh we can start just 25:48exploring that space of hyperparameters 25:50in the training space faster and 25:52therefore get to better quality outputs 25:55uh more quickly. 25:56>> Yeah, that's a cool interpretation. It's 25:58like basically that like it's not 26:00actually about the hardware in a certain 26:01sense like 26:02>> yeah the hardware probably has a role to 26:04to speed up that iteration cycle but 26:06it's not just more hardware more 26:08hardware more hardware. 26:09>> Yeah that's right. It's really an it's a 26:11really a scaling experimentation law 26:13right basically like it's not just add 26:15more compute it's add more compute so a 26:17bunch of folks can experiment which 26:18causes algorithmic improvements which is 26:20really where the results are coming 26:21from. That's that's really interesting. 26:23[music] 26:24All 26:26right, I'm going to move us on to our 26:27last topic of the day, a business story. 26:30Uh, a very interesting one. Um, you 26:31know, throughout the entire year we've 26:33been talking about agents. Um, ChatGpt 26:35launched, you know, one of the most 26:37obvious kind of agentic features, which 26:39is, um, something called shopping 26:40research. So, the idea is you're going 26:42to use ChatgP to go do your holiday 26:44shopping. ChatGpt will go out to the 26:46world and find products that match the 26:48kinds of queries that you're looking 26:49for. Um, and when you think shopping on 26:51the internet, you of course think 26:52Amazon. uh the place for which you buy a 26:56lot of things. Um the news came out uh I 26:59think just earlier this week that Amazon 27:01will be blocking the chat GPT shopping 27:04research agent from looking at product 27:06detail customer data deals on Amazon. Um 27:11so super interesting development right 27:13because instantly that makes a product 27:14like or feature like shopping research 27:17maybe a lot less effective right in so 27:19far as Amazon really is the 27:20infrastructure for all sorts of shopping 27:22online. Aaron, I'll kick it over to you 27:24for maybe the obvious question, but I 27:25think it's worth making it explicit. Why 27:27is Amazon blocking uh ChatGpt? 27:30>> Yeah, I mean, so, you know, just to 27:32start out with, it's it's as if like 27:33Amazon told ChatGpt Shopppeebot to go 27:36window shopping, but yet the door's 27:38locked. You know, it can't go in and it 27:40can't, you know, look at the data and 27:42understand you sort of, you know, the 27:43product listing or prices or or any of 27:45that. And and and it seems as though 27:48that that that Amazon might have done 27:50this for a couple reasons. you know, one 27:51of them could be to protect their 27:52e-commerce data, right? So that third 27:54party tools can't just directly access 27:56that um right so so it keeps so they 27:59keep control of their shopping funnel, 28:01right? And then and and that door is 28:03locked so nobody else can go in and it 28:06protects their business model, right? 28:07For ads, commissions, first party 28:10traffic and and so on. But I think a 28:12subtlety uh or maybe not not so subtle, 28:15right, is that is that Amazon is also 28:17working on, you know, their own AIdriven 28:19shopping services, right? They have 28:20Alexa plus right that's coming they have 28:22Rufus right so th those two elements I 28:25think you know they're trying to keep 28:26their own ecosystem but but I think what 28:29this ultimately means is that we have 28:31these turf wars right that are starting 28:33right right where we have this open 28:34shopping AI verse closed retail empires 28:37right right that we have going and and 28:39and what what this could mean is that 28:41maybe and it'd be pretty neat if this 28:43happened but these smaller retails 28:45retailers could band together so they 28:47could collectively compete against 28:49Amazon Amazon, you know, so so there's 28:51thousands and thousands of mom and pop 28:53shops, you know, that now could compete 28:54against the big elephant, you know, and 28:56and so so we'll just just have have to 28:58see. Um, but I but I but I am curious 29:00to, you know, if Amazon is now being 29:03pushed through competition to double 29:04down on Rufus, right, and to really 29:06invest more in Alexa Plus and what is 29:09that going to mean to us as we begin to 29:11shop for the holidays, right? I I I I 29:14hope for one I could save, you know, 29:16right, you know, some some money, you 29:18know, with these tools and find the best 29:20products, you know, but uh but yeah. 29:22Yeah, we'll have to wait and see. But 29:24it's uh um exciting news, I think. Uh 29:27overall, 29:28>> yeah, the parallel story which I was 29:30thinking a little bit about is, you 29:31know, a few months back Cloudflare, 29:33right, said we're going to start 29:34blocking um uh AI agents. Uh it's going 29:38to be by default you can't get through. 29:40And I think what they said was, "Look, 29:41we're standing up for all these websites 29:43and eventually we're going to create a 29:44little bit of a toll booth. So if you 29:45want to access the data on a website, 29:48you'll have to come through Cloudflare 29:49and we're going to we're going to make 29:50sure that they pay." Um, I guess Gabe, 29:53you know, there's been a lot of talk on 29:54whether or not agents are technically 29:56possible. It kind of feels like there's 29:58a big question on whether or not they're 30:00even possible as a matter of business 30:02incentive. um you know relies on an 30:05internet where you can just access 30:06information, you can just access 30:08different platforms, but it feels like 30:10the walls are going up everywhere. Feels 30:12like that could actually really stifle 30:13the whole dream of agents really even 30:15being a practicality. 30:16>> Yeah. You know, I I think you are 30:19definitely right that there's we're 30:21we're hitting a a point in this AI 30:24timeline where all of a sudden um making 30:28money is going to really matter. Uh and 30:30that kind of stinks as a technologist, 30:32right? Like it's been really fun to just 30:33have this ride where all these big shops 30:36are just putting out great technology 30:37and thus far haven't been really tying 30:40that to revenue goals specifically. Um, 30:42you know, I can say one of the things 30:44that I turn to chat GPT for is questions 30:48that I don't want to later show up in my 30:50Google newsfeed or my Google ads, right? 30:52Like, uh, you know, I I I love that in 30:54many cases, these AI technologies have 30:58not yet been linked into the money 31:00machine. And it's just been a matter of 31:03time before that happens. And so, 31:04clearly, this is one very big step in 31:07that direction, right? Um, and the thing 31:10that I think is interesting is to 31:12speculate out a little further and think 31:13about this almost like um where the 31:17browser wars uh hit and the inevitable 31:21antitrust lawsuits that came around 31:23around browser, you know, defaults and 31:26browser walls, right? So, um I suspect 31:30that at some point agents will become 31:34the new browser. I don't think I'm alone 31:35in that speculation. Like I think a lot 31:37of people will go to the internet 31:39through their agent at which point the 31:41idea of having sort of a tight 31:43vertically integrated ecosystem that 31:46precludes some agents from accessing 31:48some content is probably going to be 31:50challenged in court frankly uh because 31:53it's going to have you know monopolistic 31:54tendencies. So 31:57you know I think we've seen this with 31:59browsers we've seen this with search 32:01engines. You know, I suspect we'll see 32:03this with agents eventually, but right 32:05now agents are still in that sort of 32:07middle ground where they're just coming 32:10out of the, oh, this is awesome 32:11technology and we're just figuring out 32:12what to do with it phase. And they're 32:14just entering the, hey, we can make a 32:16ton of money with these things phase, so 32:18we better protect our moat. And I 32:19suspect we'll come to the like, you 32:21know, commoditized and or, you know, 32:24like legally regulated phase in a little 32:27while. 32:27>> Yeah. Abraham, I think I'll give you the 32:29last word on today's episode. I mean, 32:31what do we what do we do about this 32:33dynamic that Gabe is talking about? It 32:35sort of feels like um you know, the 32:38whole point of the agent is that there's 32:39one agent that you can use to do 32:41everything that you want to do say 32:43across the internet. Where this is going 32:45seems to be like, okay, well, you got 32:47the chat GPT agent, but if you want to 32:48buy something on Amazon, you you got to 32:50use the the Amazon agent basically, 32:53which like kind of destroys the whole 32:55original uh value prop, I think, for 32:58agents in some a certain sense. It 33:00certainly makes it like a lot more 33:01annoying to manage. Um, is there a way 33:03to get back to the world of sort of free 33:05flowing agents that kind of like operate 33:07on your behalf generally? Are we kind of 33:09by dent of the business incentives here 33:11kind of being pulled towards a world of 33:13like it'll just be app world again? 33:15Right. Every what used to have an app 33:16now you just have an agent that replaces 33:18that but we'll be effectively the same 33:20the same world. 33:21>> Yeah, that's a I mean that's an 33:23interesting question. I mean off the top 33:25>> I'm just giving you a small one to wrap 33:26up the episode. 33:27>> Yeah. No, no. You [laughter] know what? 33:29Thinking out loud, maybe it's less of a, 33:31you know, a third party agent that um 33:34controls the entire like you is your 33:36control flow into the internet and it's 33:37more of a personal agent that calls 33:39particular third party agents as they 33:42are approved or required. So from your 33:45experience from an end user, I have an 33:48agent and it calls, you know, there's a 33:50multi- aent system that's for Abraham 33:52Daniels and it can only call or it calls 33:54the required agents or the agents that 33:56are tailored for the specific uses that 33:58I want to be able to carry out. Um, but 34:01other than that, I I know that's a 34:03that's a very nebulous very kind of like 34:05big hairy problem question that I I 34:07generally don't have the the full answer 34:09to. But I I do think that, you know, 34:11this Amazon article really showcases, 34:14you know, platform competition versus AI 34:16competition. And what I thought was kind 34:19of neat about it was more so the um the 34:21opportunity to transition SEO to AI 34:24assistance um to just to to Gab's point 34:28in terms of um you know trying to find 34:30dollars and cents after all you know the 34:32billions of dollars have been invested 34:34into AI. Um, I think, and I said this on 34:37a previous mixture of experts, I think 34:40Open AI is now very very um focused on 34:45how to eventually turn all these agents 34:47and capabilities that they've built into 34:49some to some dollars and cents. So, um I 34:52I I I truly I kind of when I saw this, I 34:55I really saw this as a a way to kind of 34:57take some SEO capabilities away from uh 35:01the the you know, the Googles of the 35:03world and see if you can start to 35:05utilize them as part of like your your 35:07uh your shopping um experience with with 35:10agents. 35:10>> Yeah, it's a good note to end on. I 35:12mean, so obviously this will be not the 35:13last time we talk about this issue. Uh 35:15but we are out of time for today. So, 35:17uh, Aaron, Gabe, Abe, awesome to have 35:19you on the show and hope to have you 35:20back soon. And, uh, thanks to all you 35:22listeners. If you enjoyed what you 35:23heard, you can get us on Apple Podcast, 35:25Spotify, [music] and podcast platforms 35:26everywhere. And we'll see you next week 35:28on Mixture of Experts. [snorts] 35:31[music] 35:37>> [music]