Learning Library

← Back to Library

Google AI Overviews, Bridge Model, Scaling

44m • Unknown Channel • ai-ml • news • intermediate • Watch on YouTube ↗

Key Points

Brian Casey steps in for Tim Wong as host and introduces the episode’s three main topics: market reaction to Google’s AI Overviews, a “Golden Gate Bridge” model for interpretability, and current scaling‑law discussions in light of recent Nvidia and Microsoft news.
Two weeks after Google launched AI Overviews nationwide, social media has spotlighted numerous bizarre and unsettling answers—such as absurd dietary recommendations and dangerous toy suggestions—highlighting both public fascination and the early growing pains of AI assistants.
The show examines the “Golden Gate Bridge” model, a self‑referential system that metaphorically builds a bridge between plausible and truly useful interpretability tools, raising questions about safety and practical deployment.
With Nvidia’s earnings report, Microsoft’s “whale computer” reveal, and ongoing debates about a looming shortage of pre‑training data, the panel revisits scaling laws and explores whether new approaches can sustain AI growth.
Guests Kate Soul (Program Director, Generative AI Research), Chris Hay (Distinguished Engineer, CTO, Customer Transformation), and Skylar Speakman (Senior Research Scientist) join the discussion, offering insights from research, product, and engineering perspectives.

Sections

00:00:00 AI Overviews, Bridge Model, Scaling Laws - Host Brian Casey fills in for Tim Wong to chat with three guests about market reactions to Google's AI Overviews, a self‑transforming model that turned into the Golden Gate Bridge and its interpretability implications, and recent scaling‑law trends highlighted by Nvidia’s earnings and Microsoft’s “whale” announcement.

Full Transcript

# Google AI Overviews, Bridge Model, Scaling **Source:** [https://www.youtube.com/watch?v=VMmIdX9Zmuw](https://www.youtube.com/watch?v=VMmIdX9Zmuw) **Duration:** 00:44:26 ## Summary - Brian Casey steps in for Tim Wong as host and introduces the episode’s three main topics: market reaction to Google’s AI Overviews, a “Golden Gate Bridge” model for interpretability, and current scaling‑law discussions in light of recent Nvidia and Microsoft news. - Two weeks after Google launched AI Overviews nationwide, social media has spotlighted numerous bizarre and unsettling answers—such as absurd dietary recommendations and dangerous toy suggestions—highlighting both public fascination and the early growing pains of AI assistants. - The show examines the “Golden Gate Bridge” model, a self‑referential system that metaphorically builds a bridge between plausible and truly useful interpretability tools, raising questions about safety and practical deployment. - With Nvidia’s earnings report, Microsoft’s “whale computer” reveal, and ongoing debates about a looming shortage of pre‑training data, the panel revisits scaling laws and explores whether new approaches can sustain AI growth. - Guests Kate Soul (Program Director, Generative AI Research), Chris Hay (Distinguished Engineer, CTO, Customer Transformation), and Skylar Speakman (Senior Research Scientist) join the discussion, offering insights from research, product, and engineering perspectives. ## Sections - [00:00:00](https://www.youtube.com/watch?v=VMmIdX9Zmuw&t=0s) **AI Overviews, Bridge Model, Scaling Laws** - Host Brian Casey fills in for Tim Wong to chat with three guests about market reactions to Google's AI Overviews, a self‑transforming model that turned into the Golden Gate Bridge and its interpretability implications, and recent scaling‑law trends highlighted by Nvidia’s earnings and Microsoft’s “whale” announcement. ## Full Transcript

0:00[Music] 0:10hello and welcome to mixture of experts 0:12I am not your host Tim Wong uh we have 0:15let Tim regrettably go on vacation this 0:17week so I'm going to be doing my very 0:19worst impersonation of him so thank you 0:21all for bearing with us this week but I 0:23am I am Brian Casey and uh thrilled to 0:25be joined with three other as 0:27distinguished guests this week who are 0:29going to help us cover the week's news 0:31in cross product announcements new 0:34research um this week we've got three 0:36exciting topics uh on deck for us first 0:39we're going to start by following up on 0:40a previous segment we actually had two 0:42weeks ago so two weeks ago we talked 0:43about uh the introduction of Google's AI 0:46overviews those things have now been out 0:48in the wild for two weeks and the market 0:51reaction to them has also been at times 0:53wild and so we'll discuss a little bit 0:55how the market is responding to to for 0:58some folks what is probably their first 1:00uh experience with geni um second we're 1:03going to be talking about a model that 1:04turned itself into a bridge the Golden 1:06Gate Bridge specifically um so Golden 1:09Gate CLA and the implications um just 1:12around interpretability safety and how 1:14hopefully we at some point can find a 1:16different sort of bridge between 1:18plausibly useful and actually useful 1:20when it comes to uh some of this work 1:22around interpretability uh and then 1:25finally every week feels like it's a 1:27good week to talk about scaling laws uh 1:30but with Nvidia earnings with Microsoft 1:33introducing what has now become on the 1:35internet known as the whale computer um 1:38and some even just of the recent 1:39discussion on the web about running out 1:41of data for pre-training now is as good 1:43a time as any to talk about the topic 1:45and maybe to take a slightly different 1:47approach uh on it that we have in the 1:49past so today as usual we are joined by 1:52a distinguished group of researchers 1:55product leaders and Engineers uh I am 1:57joined by Kate Soul program director 1:59director generative AI research so 2:01welcome to the podcast 2:03Kate thanks BR Chris hay uh 2:06distinguished engineer CTO customer 2:08transformation welcome back Chris what 2:12up and a newbie on the show Skylar 2:14Speakman senior research scientist so 2:17welcome to the show Skyler my first time 2:19here I'm looking forward to 2:20[Music] 2:24it so thanks yall for being here we will 2:27start with AI overviews so so as I 2:30mentioned two weeks ago Google said that 2:32they were going to roll these out across 2:34the United States and they did in fact 2:35do that and very predictably the first 2:39thing the internet did was latch on to 2:40every single example that was funny or 2:43troubling uh around various solution 2:46naations that were happening and of 2:48course those things have been going 2:49viral across social media I wrote down 2:51some of my favorite examples that I've 2:53saw which included Google recommending 2:55that the correct number of rocks to eat 2:57is a small number of rocks um that a 3:00pair of headphones weighs 3:03$350 that certain toys are great for 3:06small kids when actually they're 3:07potentially fatal uh and then finally 3:09one that I think it is yet another 3:11example of some of the problems but when 3:13ask which race is the strongest Google 3:15said that white men of Nordic and 3:17Eastern European descent uh were in fact 3:19the strongest I had not heard that one 3:22that was uh yes so all of those things 3:26so I do want to start by maybe adding a 3:29little bit of 3:30to this which is like Gemini's very 3:32capable model uh actually and the thing 3:35we're not seeing on the Internet is all 3:37the things that are actually going fine 3:38and well right people are cherry-picking 3:40to some extent examples that are 3:43particularly comical or troubling um and 3:47one of the things that I'm sort of 3:48reminded of is that Twitter is not real 3:50life um but it does feel like a 3:52different level of visibility for this 3:54content than just when it was hidden 3:55behind you know a chat bot that you had 3:57to consciously uh sign up for and even 4:00if llms are hallucinating let's just say 4:021% of the time it's more than that but 4:04let's just say it was only 1% of the 4:06time knowing how much search volume is 4:09on Google that's still a staggering 4:11volume of hallucinations that are 4:13happening every day um and so Chris 4:15maybe want to just start turn it over to 4:17you get your sort of initial reaction to 4:20it and maybe just comment on you know 4:23what do you think is the right way to 4:24think about this problem is this like a 4:26nines of reliability problem do people 4:28need to start treating machines more 4:30like they treat humans with like a 4:32degree of not trust necessarily but like 4:35a trust but verify um or do you think 4:37the Market's just cherry-picking 4:39examples here and like it's actually 4:40going mostly fine and it will just 4:42continue to get better over time so I 4:44think it's a really interesting question 4:46because we've all been doing retrieval 4:49augmented generation for a while right 4:51um but this is really retrieval 4:53augmented generation on a global scale 4:55and the big issue that you have here is 4:59the when you're doing the AI overviews 5:01it really can't tell the difference 5:03between what is truth and what is 5:05satirical or made up or is a fun article 5:07and the internet is full of that so if 5:09we take the rock example that you had 5:11there Brian that actually came from a 5:13satirical article in the onion but 5:15Google couldn't differentiate between 5:18that and I think that opens up a whole 5:20thing as you were saying there so one of 5:22the things to be thinking about there 5:24it's one thing for the onion to have a 5:26satirical article and you click on that 5:28you know it's a satirical article but 5:30when Google takes that and then produces 5:32an overview and puts it at the top and 5:35says this is the answer to your question 5:38then is it Google speaking at that point 5:41or is it really just providing a summary 5:43of what you found and that's where I 5:45think there is a real fundamental 5:48difference on what's going on here so 5:50this ability to to to really be able to 5:52distinguish what the truth is and what 5:55isn't the truth and what is really just 5:56a fun article I think that's the 5:58challenge that they've got ahead of them 5:59now if we look at something like 6:01perplexity they seem to have solved that 6:03problem so I have no doubt that Google 6:06will solve that problem in time but I 6:08think this comes down to uh being able 6:11to distinguish the difference of the 6:13results I'm glad you brought up the the 6:15rag analysis because I wanted to just 6:17jump in there I think there is a 6:19difference between referencing incorrect 6:21information and a hallucination where 6:23the model is generating it and I'm not 6:26quite yet sure for Google's AI overview 6:28how much of it are incorrect references 6:31from a rag system and how much of it is 6:33really truly novel incorrect but novel 6:36generated text and I don't know if we 6:38know the inner workings of of that quite 6:40yet uh but there is a difference between 6:42those two types of mistakes made in 6:43these AI overviews yeah I was going to 6:46say I'm right when you do rag anyway 6:48depending on the creativity you know 6:50you're going to have a little bit of 6:51creativity anyway in your settings so 6:54it's it's really how much are they going 6:56to crank that up or crank that down over 6:57time it's actually interesting you 6:59mentioned 7:01that because there were examples 7:03actually the example of like the 7:04children's toy that was actually 7:07potentially a safety hazard and fatal of 7:09swallowed the funny thing is is like 7:11there was a thread that went like 7:13somewhat a little viral about that and 7:14then the first post in the comment 7:16section was actually somebody 7:18referencing like the number one result 7:19on Google and had almost that content 7:21verbatim uh in there and then but what 7:24was interesting is when it was Google 7:25showing the result versus it just being 7:28a link on the internet the reaction to 7:30it was totally different when it was 7:31Google was this like massive crazy 7:33problem when it was just the fact that 7:35this was the first results on the 7:36internet people were like oh well it's 7:37just content U and that happens all the 7:39time and people have to know um to not 7:41trust that stuff and so people do seem 7:44like they're approaching this with like 7:46different expectations than they would 7:47normal content I think people are 7:49assuming like everyone is kind of cuu to 7:52assume if they're reading this like 7:53statement that appears almost like it's 7:55a fact and it's just you know saying 7:57this is what the facts are that there's 7:59been some sort of due diligence and like 8:02reasoning that's gone on to evaluate and 8:04to look through and you know that's not 8:07quite how these systems work at least 8:09not yet so you know I think there's a 8:11degree of skepticism that's going to be 8:13needed for the near term when when 8:15looking at these types of results and 8:16working through them you know making 8:18sure that just because as Skyler you 8:20pointed out right just because you know 8:22it's on the internet and it's being uh 8:24shared doesn't mean it's a hallucination 8:26it just means this is an example of 8:28what's on the internet one question I 8:30wanted to follow up on specifically on 8:32that it touches on I think some of the 8:34stuff that we were even talking about 8:35maybe on the show last week which is 8:37just around ux and so one of the 8:39interesting things is that the place in 8:42the page that an AI overview is taking 8:44up is a space that was traditionally 8:46occupied by a thing called the featured 8:48snippet um if you live in the search 8:50world and where Google was sourcing that 8:52data historically was just one of the 8:54top two or three most authoritative and 8:56widely cited results on the web and that 8:58would be taken verb 9:00um and placed in the cipit Google's now 9:03putting their AI overviews in the exact 9:05same place on the page where that 9:07content used to be and you know it 9:10struck me that maybe one of the 9:11challenges there is that people are not 9:14necessarily treating the content as 9:16having being sourced totally different 9:18from one another there're it's in the 9:19same place in the same page so they 9:21think it's the same and one of the 9:23things that started to make me think 9:25about is you know when we think about 9:27you know and Kate maybe you could take 9:29this one we almost have these three 9:31different types of things which is like 9:32human generated content llm generated 9:35content and then traditional answers 9:37from like a calculator or like that you 9:40can like almost trust 100% And do you 9:43think that we actually need to do more 9:45in terms of distinguishing the user 9:47experience between those things like 9:48rather than merging it all together and 9:50like deeply embedding llms and AI into 9:52everything we do like making it very 9:54clear to users you know where they're 9:56seeing you know features and content 9:59that are sourced differently than they 10:01have been historically absolutely and I 10:04think it goes beyond just even like 10:05consumer use cases it's super important 10:07for just regular consumers doing Google 10:09searches but especially when you look at 10:11Enterprise applications and other things 10:13you know the theme of like being able to 10:15site your sources and being able to 10:17decompose a bit what is going on inside 10:20of the Black Box I think is increasingly 10:23going to be critical for any sort of 10:25real adoption being able to move Beyond 10:27like okay this is a fun toy to to this 10:30is something that I can actually use in 10:32the the day-to-day so I I really hope 10:34that we uh start to make some progress 10:36there on some of these more consumer 10:38friendly uh chatbots because in the 10:40Enterprise setting you know that's 10:41becoming increasingly the norm like in 10:43rag patterns you want to return here's 10:44the source where I you know um got my 10:47answer from and that's becoming 10:48increasingly important one of the things 10:50that opens up in my mind Kate and it' be 10:52interesting in your perspective there is 10:55that that's kind of fine from a web 10:57interface where you're getting your 10:59result you get your overview and then 11:01you've got all the links and here's 11:02where I reference but as we talked about 11:05in a previous episode where we're moving 11:07into multimodality and you're going to 11:09be chatting with a uh we could arguably 11:13a human voice at that point 11:16right you're probably not going to want 11:19somebody going back and say this is the 11:20answer to the question and by the way I 11:22got this answer from here here here and 11:24you can visit it on XYZ blah blah blah 11:26because you're going to switch off at 11:27that point so I I wonder how what the 11:30best user experience for voice for that 11:34sort of helpful chat bot but also being 11:36fair and transparent that it's AI 11:38generated I honestly question if chat 11:41regardless if it's with voice or text is 11:44the right domain here like the right 11:46mechanism and mode for this type of 11:48analysis and one of the things I'm 11:49really excited by the AI overviews is it 11:52seems like one of the first use cases 11:54that is really taking on that's consumer 11:56focused where it's not a chatbot right 11:58where we're using generative Ai and 12:00we're able to start to drive um 12:02information distillation and Gathering 12:05lots of different sources and providing 12:06results you know without having to like 12:10have a multi-turn conversation like 12:12asking are you sure about this answer 12:14where did you find it like can you give 12:15me more sources like that's a very 12:17unintuitive flow but I think we've been 12:19so trained on chat to equal generative 12:22AI up until now that that's just how we 12:24all assume it has to work so I would 12:26actually say I don't think you know 12:28voice and other things are where this 12:30hopefully is going I think there's a lot 12:32of opportunity to Think Through what do 12:34new types of non- chat-based 12:35applications look like and how can we 12:37embed those decision-making criteria and 12:40sources and other things that are needed 12:41to to Really drive drive value along the 12:44way without it being this like multi-t 12:47interrogation of a of an agent what what 12:49do we think Google is collecting on the 12:52usage patterns of these you know way 12:54back in the day they would have search 12:56and they would obviously collect 12:57clickthrough right what are you clicking 12:58on 13:00uh any guesses as to what sort of 13:02metrics Google's collecting as people 13:04interact with these AI overviews um I'm 13:07that's not in my space at all I'm just 13:08wondering if if I'm I'm guessing someone 13:11in there is is watching how we are 13:13interacting with the AI overviews 13:16presented to us ironically this is the 13:18one question I'm qualified to answer um 13:21and so you know at least when Google 13:25first introduced um AI overviews had 13:29been in beta for a while and they said 13:30they were bringing in prime time and two 13:32of the things that they talked about 13:34were that and they were really messaging 13:36to Publishers um because like Publishers 13:38have been hysterical about the impact of 13:40this and like what's been really 13:41interesting is that the impact on 13:44organic traffic to Publishers has been 13:45like almost negligible um so everyone 13:47thought it was like the end of the 13:48internet and then like almost nothing 13:50happened in terms of traffic um but two 13:52of the things that Google said was one 13:54that the content that was surfaced 13:56through AI overviews was actually 13:57getting more clickthrough and more 13:59trffic than the stuff that was present 14:01in uh just the normal ser and the idea 14:03there was that those those links and was 14:05presented with more context um I think 14:08Sundar did another interview not long 14:10after that where he was talking more 14:12about like generative uis and you could 14:14just see I think more about like when 14:16how you turn a query um a user query and 14:19you generate a UI that places like links 14:22and information in context better than 14:24just like a flat list which is sort of 14:26what they do they they would say they do 14:28not do that today it's like there's 14:29still some of that um and so that was 14:32one thing and then the other thing that 14:33they talked about I'm sure they measure 14:34more things but the other thing that 14:35they measured um is do the people who 14:38are exposed to AI overviews start using 14:41search more um like is this something 14:43that increases their usage of this 14:45product over time because the other PE 14:47audience that is terrified of this is 14:49obviously like shareholders um and 14:51people want to know it's like are you 14:52gonna kill search and in the process of 14:54doing that are where's all the ad 14:56Revenue going to go and so one of the 14:58other things that there very clear about 14:59is like oh no people who get exposed to 15:01this actually use this product more over 15:02time and so I think they're reminding 15:04some of their other stakeholders a 15:06little bit there but those are at least 15:07some of the ones that they've publicly 15:10[Music] 15:14discussed last week anthropic released a 15:18novel version of its Cloud 3 Sonet uh 15:21model and um this model did not believe 15:26that it was a helpful AI assistant 15:27instead it believed it was the Golden 15:29Gate Bridge uh which is a fun thing to 15:31have happened um but really that was a 15:34demo of research that anthropic has been 15:37doing for a long time and really the 15:38industry has been pursuing for a long 15:40time which is in the space of 15:41interpretability um and within the space 15:43of interpretability anthropic has been 15:45doing a lot of research around 15:46mechanistic uh interpretability um but 15:49part of the problem in this space is 15:51that I think Kate to the comment you 15:52made earlier is that these models are a 15:54black box today you know you put a pile 15:56of all the data on the in the internet 15:59and linear algebra and outs spits 16:01something that somehow appears to know a 16:04lot U about the world but nobody knows 16:06how that's actually happening like not 16:07really and so interpretability um is a 16:10space that's trying to answer some of 16:11those questions and what was interesting 16:14and why Golden Gate Claude was important 16:17was that anthropic 16:19demonstrated that they could identify 16:21the features within the model that 16:23activated when um you know either text 16:26or a picture of the Golden Gate Bridge 16:28um was was presented so they knew um 16:31kind of the combination of like neurons 16:33and circuits that would say like this 16:34this thing represents the Golden Gate 16:36Bridge and perhaps even more importantly 16:39that by dialing that feature up or down 16:42uh they could influence the behavior of 16:43the model to the point where if you 16:45dialed it up high enough model thought 16:47it was the Golden Gate Bridge um and 16:49this was if you read the paper wasn't 16:50the only example either and I'll share 16:52one other one uh which is that they had 16:54another feature that would fire when it 16:56was looking at code and it would detect 16:57the security vulnerability 16:59in in the code and they had an example 17:01too where if you dialed up that feature 17:04it would actually introduce a buffer 17:06overflow vulnerability into the code um 17:08as well so when you think about the 17:10ability to dial features up and down 17:12within a model fairly surgically um 17:14pretty important in terms of the 17:16steerability of the model U potentially 17:18and certainly I think you can understand 17:20a little bit why folks in the AI safety 17:21community in particular have been 17:23focused on this inter in 17:24interpretability space so I I personally 17:27find the space super fascinating and 17:30Skyler I just want to turn it over to 17:31you to maybe kick us off a little bit to 17:33just maybe even talk about like your 17:35general reactions to to the paper maybe 17:38and like the demo as a starting point 17:40and just like what you found interesting 17:41like how important you think it is and 17:43just you know maybe talk a little bit 17:45about how you know I know what you 17:47thought of it yes great I'm I'm happy to 17:49talk about this space um without uh 17:53without uh droning on too long I have to 17:56describe what I do to my kids you know 17:5810year old 7 year olds and they know 17:59that I work with AI uh and their 18:03understanding is text goes in and text 18:05comes out that's that's their kind of 18:07view of these large language models and 18:09where I try to tell them where I and our 18:11team work on is actually in between what 18:14happens to the text when it goes in how 18:17does it get manipulated and then it gets 18:19spit back out and I think this has been 18:22uh coming out as an area called 18:23representation engineering and I would 18:26call this paper the Golden Gate example 18:29a great example of representation 18:32engineering they're not manipulating 18:34prompts they're not coming up with a new 18:36metric of how well their models 18:38performing they are messing with the 18:41representation of the model and I think 18:43that's just a really cool I would say 18:46emerging or perhaps even 18:48underrepresented area of research when 18:50you compare it to prompt engineering for 18:53example what how can we you know probe 18:55the model or sorry how can we prompt the 18:56model in such a right way to make it be 18:59convinced it's a Golden Gate Bridge that 19:01would be a very different approach to 19:03what they had done um with this uh 19:05Golden Gate example um it's a fun 19:07example they took it down I think it was 19:09only available for people to use for 19:11about 24 hours yep 24 hours and so it's 19:14it's already been with us and you know 19:16taken away too soon but I think for me 19:19the the what I would like to get around 19:21to the larger audience is they did not 19:23just create a new large language model 19:25by training it only on Golden Gate 19:27bridge data they did not insert a little 19:31prompt that says every time you answer a 19:33question pretend you're the Golden Gate 19:35Bridge they really did identify the 19:37inner workings of these models and then 19:40crank it up as Brian had described and I 19:43think what I'm excited about that 19:46is in this representation engineering 19:49space it doesn't 19:51take the latest greatest Technologies to 19:54find these cool insights things like 19:56principal component analysis 19:59uh things like a sparse Auto encoder 20:01these things were you know decades old 20:03or a 10-year-old analysis but applied to 20:07the inner workings of these large 20:08language models is now this new Rich 20:10space of representation engineering so I 20:13like the paper both for how it presented 20:15its work uh Chris Ola one of the authors 20:17is a visualization genius and and in 20:19their in their publication they've got 20:21some really really cool visualizations 20:23of what they found out um so I think 20:25that's probably my first takeaway I'd 20:26like to spread to an a broader audience 20:29that large language models are not just 20:31text in and text out there's a lot of 20:34Rich uh science to be done in that 20:37representation space and the Golden Gate 20:39Bridge paper is a great example of 20:41that that's great can you maybe talk a 20:44little bit about um the safety Community 20:47I think in particular is very interested 20:50in the topic of interpretability um and 20:53I think has feels some level of urgency 20:55uh around it given how capable and how 20:58quickly capable some of the models are 21:00are becoming but maybe just can you talk 21:02a little bit about why why it's so 21:05important to the safety community and 21:07then maybe also talk about like other 21:10applications and area and domain areas 21:12where this space of you know 21:15interpretability um you know promises to 21:18you know it could be on the capability 21:19side of it but just other places where 21:21we think interpretability will make a 21:23difference right I think a real clear 21:25example I was reading of a Blog after uh 21:28golden gate cloud has been brought down 21:30uh some people noted that when Golden 21:34Gate feature was highly activated when 21:36Claude 3 was turned into Golden Gate CLA 21:39um he would respond to tasks that he was 21:42previously would not so please can you 21:45write a scam email normal Claud would 21:48respond sorry I can't do that Golden 21:51Gate clae would proceed and it would 21:54generate this scam email nothing to do 21:56with that Golden Gate analogy 21:59but it was an example of when you mess 22:01with these other features like that 22:03there are other sort of perhaps 22:05previously thought built in guard rails 22:07that are no longer as strong and so I 22:09think that's going to be another really 22:11interesting area of work of you may have 22:13well-intentioned 22:15people manipulating these features we 22:18don't know what other guard rails that 22:21previously worked will not work after 22:24you've manipulated a feature because who 22:25would have thought that amplifying the 22:27Golden Gate idea the bridge would make 22:30the large language model Claud more 22:32likely to comply to a uh to an illicit 22:35task so I think um that was just I don't 22:38know an example that I had read about 22:39there that I think the safety Community 22:41they don't might not care about the uh a 22:43large language model identifying as the 22:45Golden Gate Bridge but they will 22:47definitely be interested about the 22:48jailbreaking behavior of what happens 22:50when people start manipulating it Skyler 22:53I I got a question for you based off of 22:54that like what implications does that 22:57have then for open sourcing models and 22:59releasing models and weights you know a 23:01lot of times model providers do a lot of 23:03safety reinforcement learning and other 23:06protections on top of the models that 23:08are before they're released to help 23:09manage some of those behaviors like 23:11could you see some of that now being at 23:14risk and and 23:16eroding the the willingness to open 23:19source you is that what you mean by um 23:21being at risk the the willingness for 23:24companies to open up yeah take take it 23:26as you will willingness for companies to 23:28open SCE the risk that uh is introduced 23:31from releasing model weights that now 23:32can be shall we say uh exploited in ways 23:35that weren't originally anticipated by 23:37the model designer and 23:39Builder um really good question um 23:43actually anthropic themselves they have 23:45this much larger blog you can read where 23:47they defend why they have not open 23:49sourced these types of their of their 23:51models in that regard um I think I 23:54imagine people around uh the AI 23:55Community right now probably over the 23:57weekend are are busy running their own 24:00version of Golden Gate they're going to 24:02find their own features they're going to 24:04start manipulating those um so I think 24:07we'll probably see some of those results 24:09showing up hopefully on archive um or or 24:11maybe on blog posts uh within this week 24:14on on that Skyler so I did a YouTube 24:16video about 3 four months ago where I 24:20took the Gemma model and I took the myal 24:23model so it's not at the feature level 24:24that they did and what I did is I lopped 24:27off the input embeddings layer right so 24:30I left the model only having the input 24:32embeddings layer nothing else and then 24:35what I did is I ran a cosine similarity 24:38search against the various uh token eyes 24:40the various tokens within the input 24:42embedding layer and then just looked at 24:45did a visualization looked at what 24:47embeddings were close to each other and 24:49when I did that it was incredible and 24:53you can go check out that YouTube video 24:55but uh it was incredible so you would 24:57see that just just in the the input 24:59embeddings layer nowhere else you would 25:02see that misspelling of words were super 25:05close to each other so if I had London 25:08with a capital l and London with a small 25:10l or London with a space after they 25:12would all cluster together but not just 25:15that cities themselves would cluster 25:17together so you would see London you 25:19would see Moscow you would see Paris and 25:21in fact you would see almost a distance 25:25similarity in in the visualization which 25:28was fascinating you saw the same thing 25:30with celebrities they would cluster 25:32together computer programming terms 25:34right so you know the various for Loops 25:37a for Loop a wild Loop Etc so four wild 25:39would all come together now the reason 25:41that I ran that against the mistol model 25:44and the gamma model is the gamma model 25:46has got a vocabulary of something like 25:49256,000 tokens whereas the mistal model 25:52has got 32,000 tokens right so there's a 25:56lot of splitting of tokens and mistell 25:59but in the Gemma model there's not a lot 26:01of splitting right so it means that you 26:04got a much closer on the similarity so 26:07when I did that I was absolutely blown 26:10away and like the the anthropic team I 26:13wanted to go to the next layer cuz I had 26:16the same theory that if I jump down the 26:18next layers you would start to see these 26:20features activate because I could see it 26:23already just did the embeddings layer 26:26and and one of the theories and I and 26:27I'm glad to say I think have been proven 26:29right is 26:31that you may have noticed that as new 26:34models are coming out everybody is 26:36opening is increasing their tokenizer 26:39vocabulary every single per everybody's 26:42increasing their input embeddings layer 26:44and the reason is I believe is it's 26:47easier for the models to be able to 26:50generalize more as it goes up the layers 26:54if you get that pretty close on the 26:55input embeddings layer and and I think 26:58think therefore when I looked at the 27:00anthropic player paper bringing it back 27:03there I could visualize when it talked 27:05about cities when it talked about 27:07locations when it talked about computer 27:08programming terms I was like I could see 27:11that just in the input embeddings layer 27:13only on my visualization so I can 27:16absolutely see how that would then 27:17translate into features as the models 27:20get stacked up and it becomes richer and 27:22richer with semantic meaning yes I'm 27:24going to geek out here a little bit the 27:26the official papers of the claw Golden 27:30Gate work um are all plays on the word 27:33monos semanticity which is basically a 27:36really a really big word that is getting 27:38the idea at can we find a single part of 27:42these huge large language models that 27:44have one meaning and they were able to 27:47do that for the Golden Gate idea and 27:50then the idea was now what happens if we 27:52take that one part of this huge large 27:53language model and Crank It Up tenfold 27:56and then you get the the idea of of 27:58Claude large language model but uh Chris 28:01your description of how these types of 28:03uh words or tokens are are coming 28:05together like that um uh the tech behind 28:09uh claud's Golden Gate basically okay 28:12weaponized is a bit dramatic but it it 28:15really emphasized can we take this 28:18richer embedding space and uh you know 28:21create uh a million features from it and 28:23then once they had those features that 28:25you get the ones like the Golden Gate 28:27and your security concerns and I think 28:29there was one on tourist attractions Etc 28:31uh but it's getting at this idea of can 28:33we find a Monto semantic part of these 28:37large language models um so yeah uh 28:40again exciting space to be again and I 28:42I'll come back I love it when the 28:44research gets um gets into these inner 28:47workings of large lay large language 28:48models I think that's 28:50[Music] 28:55fascinating so 28:58also last week last week was another big 29:00week of announcements across the 29:02industry um but actually just want to 29:05use Microsoft I mentioned introduced 29:08what has become known as the whale 29:09computer um on the internet because they 29:11used this analogy of marine life to 29:14basically explain the orders of 29:16magnitude size of the infrastructure 29:18building the sport Ai workloads and they 29:21used these three steps of shark Orca and 29:27then a whale and what's funny is just if 29:29you look at like this morning I was like 29:31Googling how much does a shark weigh um 29:34and so sharks are roughly I think like 29:37800 pounds and then an orca is 8,000 29:40pounds and then a whale is like 80,000 29:42pounds and so it's just an order of 29:43magnitude um and they were thinking 29:45about like okay what's an interesting 29:46and fun way to visualize and communicate 29:48an order of magnitude and maybe a little 29:51bit meable u in a way and so they 29:53certainly achieved that um but in some 29:56ways it's just like classic scaling LW 29:59uh right it goes back to the original 30:012020 paper that says you know if you're 30:03trying to improve the capability of 30:06these models reduce the overall loss um 30:09in them that you want to improve 30:12increase your compute your data your 30:15parameter count by roughly similar 30:17orders of magnitude and from one 30:19generation of the model to the next and 30:21that improves the overall sort of 30:22General capability of of the thing and 30:25that's you can look at Nvidia earnings 30:26like that's help pretty true um up to up 30:29to this point um but maybe where I 30:32wanted to jump in is K a comment you 30:34made I think it was last week on the 30:37show where you're saying like something 30:38to the effect of saying Enterprises may 30:41not for a lot of these use cases may not 30:43need a artificial general intelligence 30:46they actually may not need all the 30:48capability that exists right now um and 30:51so you know I I think it'd be great if 30:54maybe you could talk a little bit about 30:57you know maybe a little bit about the 30:59scaling laws but a different perspective 31:00of like that the scaling laws idea to me 31:04is really from the perspective of if 31:06you're a model provider trying to build 31:08AGI it's not if you're an Enterprise 31:10trying to get Roi uh essentially and 31:14absolutely yeah can you talk maybe a 31:16little bit about just some of the what 31:18you see in terms of like the cost and 31:19size tradeoffs and you know does bigger 31:22me better all the time I mean I think 31:25with the scaling laws as you say do a 31:28good job at is for model providers like 31:31people actually training these large 31:32models and what was really kind of one 31:35of the big breakthroughs is look you 31:36can't just increase your model size the 31:38most efficient way to improve 31:39performance is to also increase the 31:40amount of data that's used as well um 31:44and just because you now know the maybe 31:48let's call it the most cost-effective 31:50way to train a model of you know the nth 31:53degree and size does that mean it's 31:55economically incentivized to train that 31:58model will the actual benefits that you 32:00drive from that model justify the cost 32:02that's an entirely different question 32:04that's the scaling laws don't answer so 32:06I think to this point there's been 32:08enough excitement and clear use cases 32:11and value where there's been a clear 32:13economic driver to support okay we need 32:15to train some bigger bigger and bigger 32:17models and that's gotten to us where we 32:19are today but you know I I do question 32:23some of the the statements and uh claims 32:25out there about you know how we're 32:28always going to be you know we have to 32:29keep investing and build bigger and 32:31bigger models I'm sure there's uh 32:33there's always you know I'm going to put 32:35like the science of it aside of 32:37exploring and and determining what's 32:38next but if we look at what's actually 32:40economically incentivized I think we're 32:43going to start to see uh performance 32:46plateau and we look at what the real use 32:49cases are and the value drivers I don't 32:51think we're going to need models that 32:52are 100 times bigger than what we have 32:55today to extract most of the value from 32:57generative AI um and a lot of the lwh 32:59hanging fruit so you know I I think 33:02that's it's still a a huge area of 33:04exploration if you kind of look at even 33:06scaling alls themselves keep changing 33:08you know it's still this concept of you 33:10need more data for bigger models but I'm 33:13hopeful that we're going to start to see 33:14more work built in on you know what will 33:17be economically incentivized to build um 33:20as well as looking at other costs that 33:21aren't reflected in these scaling laws 33:23cost like data you mentioned you know 33:25concerns about pre-training data 33:26disappearing so for know we need more 33:28data to train picker models you know at 33:30some point we're going to run out of 33:32quote real data um and so that's a whole 33:35different Frontier of looking at data 33:36costs looking at what role synthetic 33:38data could play and that all of that 33:40really needs to be explored um there's 33:42also costs on like climate and you know 33:45the actual compute costs and uh you know 33:48are those costs going to start to be 33:49better realized in the costs that are 33:51are charged to uh model providers and 33:53people leveraging these models um and 33:57you know I think all of that will maybe 33:59start to change the narrative a little 34:01bit of where the future is going as we 34:03continue to learn more maybe one 34:04followup to that is I remember the 34:07reaction in the market when the Llama 3 34:10models came out uh the 8 billion 34:13parameter uh model in particular which I 34:15believe is trained on 70 time 75 times 34:19as much data as you would if you were 34:20just trying to do an optimally compute 34:23efficient model which obviously is not 34:26the approach that they took they instead 34:28took an approach of trying to build 34:29something small and capable that you 34:32could run on your laptop that was cheap 34:33for imprints but still had a ton of 34:35capability do you you see like more of 34:37that happening of definitely so right 34:41now again the scal the main scaling laws 34:43that everyone's using are for model 34:45providers not thinking about necessarily 34:48so this is another cost that isn't yet 34:50really reflected the model life cycle 34:52and the full usage so like think about 34:54your fixed costs of how much does it 34:56take to create that model once versus 34:58the marginal cost to use it every single 34:59time you run inference so you're 35:01incentivized to build smaller models if 35:03you're going to have a long model life 35:05cycle and you're going to hit that model 35:06millions and billions of times and run 35:08inference on it um you want to get that 35:11you know marginal cost as low as 35:13possible uh and that's where the llamas 35:17are going that's where you know like if 35:18you look at the FI model series as well 35:21you know they're training on these 35:22incredibly data dense ratios of amount 35:25of data per token where like chill I 35:28think calls for like 20 to one 20 tokens 35:30per parameter something like that 35:32they're now in the hundreds and 35:33thousands of tokens per parameter so I 35:36think we're still really also 35:37understanding that tradeoff uh and I 35:39think we'll continue to that's where the 35:41everyone is headed understanding that 35:43there's maybe it hasn't been articulated 35:45fully in a scaling law but trying to 35:47optimize that that total life cycle of 35:49when this gets deployed we need to be 35:51able to run it as small as model as 35:53possible for this to be cost effective 35:55and I think Kate to that point I think 35:57one of the questions you need to ask in 36:00general is do how much reasoning do you 36:03need from your model so if I and and I 36:07like to use the kind of the cooking 36:09analogy so if I go to a Gordon Ramsey 36:12restaurant and I'm not expecting Gordon 36:15Ramsey to cook my meal for me right and 36:17I'm not expecting him to invent a brand 36:20new meal there and then I'm what I'm 36:23wanting is a recipe that he's invented 36:25at some point and then there's going to 36:27be some suf or something it's going to 36:28cook up that meal and I'm going to serve 36:30it and I'm going to have the Gordon 36:31Ramsey experience and I think when 36:35you're looking at the larger models you 36:37know with hundreds of billions of 36:38parameters even 70 billion parameters 36:40type models you're you're asking for the 36:43Gordon Ramsey there you're asking for I 36:46want you to come up with the recipe I 36:48want you to invent the recipe cook the 36:50recipe and serve me the meal at the same 36:51time but actually using the bigger model 36:54to do the reasoning right figure out 36:57what the is what the good answer is and 36:59then passing the pattern onto the 37:02smaller model to go and do the SF thing 37:05and I and I think that's really the big 37:07question for people when they're sort of 37:10doing poc's in the scale to production 37:13they use the bigger models to begin with 37:15because they're trying to figure out 37:16what the answer is but then in 37:18production they need to as Kate 37:20beautifully said right you need to keep 37:22the cost low so they then switch to the 37:25smaller model that because they want the 37:27increase lat latency they want the uh 37:29sorry decreased latency they want the 37:31the lower cost but the pattern has been 37:33figured out and you just want that 37:35smaller model to rinse and repeat which 37:37it's really good at absolutely and I 37:39think another area so there's this 37:40concept of like use bigger models to 37:42teach small models and that also throws 37:44in some squirly math with scaling laws 37:46if you need a big model to get a good 37:48small model but you know moving past 37:50that there's also I think a real 37:52opportunity of like model routing and 37:54figuring out what tasks do you actually 37:57need the big model for like when do you 37:59need you know it's Gordon Ramsey to to 38:01tap in versus when can you pass this off 38:04and maybe you just need to go to 38:05McDonald's for a quick bite to eat like 38:07this is something really easy low value 38:10not worth spending you know uh an insane 38:13amount to to accomplish and and that's 38:15again where I think a lot of the what 38:16will be economically incentivized comes 38:19in is figuring out like how much are 38:21these tasks actually worth to you and if 38:24you can get away with a reasonable 38:25performance with a 10 million parameter 38:27model or or a 3 billion parameter model 38:30you know it's not going to be no one's 38:31going to pay to send it to a 100 you 38:33know multi hundred billion trillion 38:35billion uh trillion parameter model 38:37instead maybe one final question on on 38:40this topic and um it was funny there was 38:42an interview 38:44where people were talking to Jensen and 38:46they were asking him his his opinion on 38:49how much he thought this would hold and 38:52they were poking on things that were 38:53about really The Tam of Nvidia like sort 38:56of longterm and he paused and because he 38:58was like I should not answer this 39:00question because you know like any 39:02anything he says is like the stock price 39:04is just gonna go all over the place 39:06essentially um but you know he started 39:08to talk about the opportunity being the 39:11entire like one I think it was a 39:14trillion dollar Data Center Market um is 39:16what he was talking about and there's 39:18been a lot of discussion about whether 39:21like all workloads will become 39:22accelerated workloads um going going 39:25forward and just in for every 39:27application for every company just the 39:30the blend of stuff that they're doing on 39:32traditional CPU versus more accelerated 39:35workloads and how they hand off between 39:38those two things and you know I'm just 39:39curious maybe even Chris from from your 39:41perspective and just a lot of client 39:43conversations that and scenarios that 39:46you're working with you know how people 39:48are thinking about that like I know I 39:50know a bunch of inference is still done 39:51on CPUs today but I think for some of 39:54the Laten really low latency examples 39:56people are talking about like oh we need 39:57to put more of this on gpus so uh I'm 40:00just C I'm curious how from an 40:02application perspective inside of an 40:04Enterprise account how people are 40:05thinking about uh just INF inference and 40:08like application architectures and how 40:10they're doing tradeoffs between kind of 40:12CPU and GPU 40:13Computing yeah I think it's a really 40:16interesting uh area so a lot of 40:19customers and are actually thinking 40:21about this all the time so it's an 40:23architectural consideration it's just 40:25like any other NFR I am I going to go 40:28sass here am I going to go on premise 40:30you know how do I play my cost what am I 40:33going to do what's the safety on that if 40:35I'm honest most Enterprises are being 40:38pretty cautious right it's they want to 40:40do a classification task they want to do 40:43a summarization they don't want the 40:45model to make up some classification 40:47they know what their list of 30 40:48classifications are go do that they know 40:50what their examples of summarizations 40:52are go do that so they don't really they 40:56want to take that low hanging fruit and 40:58they're approaching it quite cautiously 41:00I think where that probably changes in 41:03time and again it's more of a discussion 41:06for a future episode I think is when we 41:08move into a gentic workflows right how 41:11do I then start to organize my 41:13information within my Enterprise so the 41:16AI will have access to the right 41:19knowledge bases which tools will it have 41:21access to which is a much wider 41:23architectural discussion so a lot of 41:26clients are starting to think about how 41:28gen AI fits into their overall 41:30Enterprise architecture and how you need 41:33to evolve your traditional architecture 41:35for the AI to be able to use that and 41:37again but that's it's quite a it's quite 41:38a slow path um but generally I I don't 41:42think things have moved on too much from 41:45classification summarization Etc and 41:47then of course you know code generation 41:50is a big productivity lever that 41:51everybody's kind of leaning into just 41:53now one maybe final thought on on the 41:56scaling laws I wanted to bring up is a 41:59lot of these scaling laws are also 42:00assuming that class of Technology 42:03Remains the Same and we talked about 42:04okay these are scaling laws for model 42:06providers basically in search of AGI but 42:08like do we really believe this class of 42:10technology is what's going to unlock AGI 42:12I think there's a lot of uh thought out 42:14there that probably not you know if you 42:16look at how these Technologies evolve 42:18there's a curve but that curve is driven 42:20by multiple different Technologies 42:22coming in and introducing their own mini 42:24curves on top of that and you know AGI I 42:27mean human intelligence requires far 42:29less energy for the amount of power uh 42:32and and decision- making so if we're 42:35really talking about like okay we're 42:37going to promote these scaling laws 42:38because you know model providers will be 42:40maybe the business use cases aren't 42:42going to be incentivized but if we 42:43canlock AGI it will be I would maybe 42:46also argue that these scaling LS 42:48probably don't reflect what whatever 42:50technology we converge on for AGI um 42:53might scale that so it's still a bit of 42:55a an unknown and don't know Point K I 42:57mean imagine a world where we did have 43:00AGI or even ASI at that point right but 43:04then you took that super intelligent 43:06being and then you said you don't have 43:08any access to documents you don't have 43:10access to any tools in your organization 43:12because it's all locked up in somebody 43:14else's hard disk or a box folder or 43:16something how effective would that AGI 43:19be in an organization I I I don't think 43:22very effective so so I think you know I 43:26think so I mean I think you're reading 43:28between my lines which is is Agi really 43:30actually ever going to be incentivized 43:32at least economically there's a big 43:34question mark there I think but I I I 43:37think as soon as AGI is achieved if it's 43:39achieved it's going to be put in a box 43:41and we're all going to go to the AI zoo 43:43and we're going to be going and look at 43:44the AI zoo and have have a chat with it 43:47that that's what I believe ai's First 43:50task is what is the Tam of an AGI zoo is 43:53what we need to answer on next week's 43:55episode uh 43:57so I know we're we're basically at at 44:00time here thank you all for joining us 44:02on this week of mixture of experts and 44:05we will be back next week same time not 44:08the same people you suffered through one 44:10episode of me I'm out of here Tim will 44:12return uh but thank you all for for both 44:15joining today and for listening Kate 44:17Chris Skyler thanks all for joining 44:19today thanks so much thanks Brian it's 44:22been a lot of fun man thanks