Google AI Overviews, Bridge Model, Scaling
Key Points
- Brian Casey steps in for Tim Wong as host and introduces the episode’s three main topics: market reaction to Google’s AI Overviews, a “Golden Gate Bridge” model for interpretability, and current scaling‑law discussions in light of recent Nvidia and Microsoft news.
- Two weeks after Google launched AI Overviews nationwide, social media has spotlighted numerous bizarre and unsettling answers—such as absurd dietary recommendations and dangerous toy suggestions—highlighting both public fascination and the early growing pains of AI assistants.
- The show examines the “Golden Gate Bridge” model, a self‑referential system that metaphorically builds a bridge between plausible and truly useful interpretability tools, raising questions about safety and practical deployment.
- With Nvidia’s earnings report, Microsoft’s “whale computer” reveal, and ongoing debates about a looming shortage of pre‑training data, the panel revisits scaling laws and explores whether new approaches can sustain AI growth.
- Guests Kate Soul (Program Director, Generative AI Research), Chris Hay (Distinguished Engineer, CTO, Customer Transformation), and Skylar Speakman (Senior Research Scientist) join the discussion, offering insights from research, product, and engineering perspectives.
Full Transcript
# Google AI Overviews, Bridge Model, Scaling **Source:** [https://www.youtube.com/watch?v=VMmIdX9Zmuw](https://www.youtube.com/watch?v=VMmIdX9Zmuw) **Duration:** 00:44:26 ## Summary - Brian Casey steps in for Tim Wong as host and introduces the episode’s three main topics: market reaction to Google’s AI Overviews, a “Golden Gate Bridge” model for interpretability, and current scaling‑law discussions in light of recent Nvidia and Microsoft news. - Two weeks after Google launched AI Overviews nationwide, social media has spotlighted numerous bizarre and unsettling answers—such as absurd dietary recommendations and dangerous toy suggestions—highlighting both public fascination and the early growing pains of AI assistants. - The show examines the “Golden Gate Bridge” model, a self‑referential system that metaphorically builds a bridge between plausible and truly useful interpretability tools, raising questions about safety and practical deployment. - With Nvidia’s earnings report, Microsoft’s “whale computer” reveal, and ongoing debates about a looming shortage of pre‑training data, the panel revisits scaling laws and explores whether new approaches can sustain AI growth. - Guests Kate Soul (Program Director, Generative AI Research), Chris Hay (Distinguished Engineer, CTO, Customer Transformation), and Skylar Speakman (Senior Research Scientist) join the discussion, offering insights from research, product, and engineering perspectives. ## Sections - [00:00:00](https://www.youtube.com/watch?v=VMmIdX9Zmuw&t=0s) **AI Overviews, Bridge Model, Scaling Laws** - Host Brian Casey fills in for Tim Wong to chat with three guests about market reactions to Google's AI Overviews, a self‑transforming model that turned into the Golden Gate Bridge and its interpretability implications, and recent scaling‑law trends highlighted by Nvidia’s earnings and Microsoft’s “whale” announcement. ## Full Transcript
[Music]
hello and welcome to mixture of experts
I am not your host Tim Wong uh we have
let Tim regrettably go on vacation this
week so I'm going to be doing my very
worst impersonation of him so thank you
all for bearing with us this week but I
am I am Brian Casey and uh thrilled to
be joined with three other as
distinguished guests this week who are
going to help us cover the week's news
in cross product announcements new
research um this week we've got three
exciting topics uh on deck for us first
we're going to start by following up on
a previous segment we actually had two
weeks ago so two weeks ago we talked
about uh the introduction of Google's AI
overviews those things have now been out
in the wild for two weeks and the market
reaction to them has also been at times
wild and so we'll discuss a little bit
how the market is responding to to for
some folks what is probably their first
uh experience with geni um second we're
going to be talking about a model that
turned itself into a bridge the Golden
Gate Bridge specifically um so Golden
Gate CLA and the implications um just
around interpretability safety and how
hopefully we at some point can find a
different sort of bridge between
plausibly useful and actually useful
when it comes to uh some of this work
around interpretability uh and then
finally every week feels like it's a
good week to talk about scaling laws uh
but with Nvidia earnings with Microsoft
introducing what has now become on the
internet known as the whale computer um
and some even just of the recent
discussion on the web about running out
of data for pre-training now is as good
a time as any to talk about the topic
and maybe to take a slightly different
approach uh on it that we have in the
past so today as usual we are joined by
a distinguished group of researchers
product leaders and Engineers uh I am
joined by Kate Soul program director
director generative AI research so
welcome to the podcast
Kate thanks BR Chris hay uh
distinguished engineer CTO customer
transformation welcome back Chris what
up and a newbie on the show Skylar
Speakman senior research scientist so
welcome to the show Skyler my first time
here I'm looking forward to
[Music]
it so thanks yall for being here we will
start with AI overviews so so as I
mentioned two weeks ago Google said that
they were going to roll these out across
the United States and they did in fact
do that and very predictably the first
thing the internet did was latch on to
every single example that was funny or
troubling uh around various solution
naations that were happening and of
course those things have been going
viral across social media I wrote down
some of my favorite examples that I've
saw which included Google recommending
that the correct number of rocks to eat
is a small number of rocks um that a
pair of headphones weighs
$350 that certain toys are great for
small kids when actually they're
potentially fatal uh and then finally
one that I think it is yet another
example of some of the problems but when
ask which race is the strongest Google
said that white men of Nordic and
Eastern European descent uh were in fact
the strongest I had not heard that one
that was uh yes so all of those things
so I do want to start by maybe adding a
little bit of
to this which is like Gemini's very
capable model uh actually and the thing
we're not seeing on the Internet is all
the things that are actually going fine
and well right people are cherry-picking
to some extent examples that are
particularly comical or troubling um and
one of the things that I'm sort of
reminded of is that Twitter is not real
life um but it does feel like a
different level of visibility for this
content than just when it was hidden
behind you know a chat bot that you had
to consciously uh sign up for and even
if llms are hallucinating let's just say
1% of the time it's more than that but
let's just say it was only 1% of the
time knowing how much search volume is
on Google that's still a staggering
volume of hallucinations that are
happening every day um and so Chris
maybe want to just start turn it over to
you get your sort of initial reaction to
it and maybe just comment on you know
what do you think is the right way to
think about this problem is this like a
nines of reliability problem do people
need to start treating machines more
like they treat humans with like a
degree of not trust necessarily but like
a trust but verify um or do you think
the Market's just cherry-picking
examples here and like it's actually
going mostly fine and it will just
continue to get better over time so I
think it's a really interesting question
because we've all been doing retrieval
augmented generation for a while right
um but this is really retrieval
augmented generation on a global scale
and the big issue that you have here is
the when you're doing the AI overviews
it really can't tell the difference
between what is truth and what is
satirical or made up or is a fun article
and the internet is full of that so if
we take the rock example that you had
there Brian that actually came from a
satirical article in the onion but
Google couldn't differentiate between
that and I think that opens up a whole
thing as you were saying there so one of
the things to be thinking about there
it's one thing for the onion to have a
satirical article and you click on that
you know it's a satirical article but
when Google takes that and then produces
an overview and puts it at the top and
says this is the answer to your question
then is it Google speaking at that point
or is it really just providing a summary
of what you found and that's where I
think there is a real fundamental
difference on what's going on here so
this ability to to to really be able to
distinguish what the truth is and what
isn't the truth and what is really just
a fun article I think that's the
challenge that they've got ahead of them
now if we look at something like
perplexity they seem to have solved that
problem so I have no doubt that Google
will solve that problem in time but I
think this comes down to uh being able
to distinguish the difference of the
results I'm glad you brought up the the
rag analysis because I wanted to just
jump in there I think there is a
difference between referencing incorrect
information and a hallucination where
the model is generating it and I'm not
quite yet sure for Google's AI overview
how much of it are incorrect references
from a rag system and how much of it is
really truly novel incorrect but novel
generated text and I don't know if we
know the inner workings of of that quite
yet uh but there is a difference between
those two types of mistakes made in
these AI overviews yeah I was going to
say I'm right when you do rag anyway
depending on the creativity you know
you're going to have a little bit of
creativity anyway in your settings so
it's it's really how much are they going
to crank that up or crank that down over
time it's actually interesting you
mentioned
that because there were examples
actually the example of like the
children's toy that was actually
potentially a safety hazard and fatal of
swallowed the funny thing is is like
there was a thread that went like
somewhat a little viral about that and
then the first post in the comment
section was actually somebody
referencing like the number one result
on Google and had almost that content
verbatim uh in there and then but what
was interesting is when it was Google
showing the result versus it just being
a link on the internet the reaction to
it was totally different when it was
Google was this like massive crazy
problem when it was just the fact that
this was the first results on the
internet people were like oh well it's
just content U and that happens all the
time and people have to know um to not
trust that stuff and so people do seem
like they're approaching this with like
different expectations than they would
normal content I think people are
assuming like everyone is kind of cuu to
assume if they're reading this like
statement that appears almost like it's
a fact and it's just you know saying
this is what the facts are that there's
been some sort of due diligence and like
reasoning that's gone on to evaluate and
to look through and you know that's not
quite how these systems work at least
not yet so you know I think there's a
degree of skepticism that's going to be
needed for the near term when when
looking at these types of results and
working through them you know making
sure that just because as Skyler you
pointed out right just because you know
it's on the internet and it's being uh
shared doesn't mean it's a hallucination
it just means this is an example of
what's on the internet one question I
wanted to follow up on specifically on
that it touches on I think some of the
stuff that we were even talking about
maybe on the show last week which is
just around ux and so one of the
interesting things is that the place in
the page that an AI overview is taking
up is a space that was traditionally
occupied by a thing called the featured
snippet um if you live in the search
world and where Google was sourcing that
data historically was just one of the
top two or three most authoritative and
widely cited results on the web and that
would be taken verb
um and placed in the cipit Google's now
putting their AI overviews in the exact
same place on the page where that
content used to be and you know it
struck me that maybe one of the
challenges there is that people are not
necessarily treating the content as
having being sourced totally different
from one another there're it's in the
same place in the same page so they
think it's the same and one of the
things that started to make me think
about is you know when we think about
you know and Kate maybe you could take
this one we almost have these three
different types of things which is like
human generated content llm generated
content and then traditional answers
from like a calculator or like that you
can like almost trust 100% And do you
think that we actually need to do more
in terms of distinguishing the user
experience between those things like
rather than merging it all together and
like deeply embedding llms and AI into
everything we do like making it very
clear to users you know where they're
seeing you know features and content
that are sourced differently than they
have been historically absolutely and I
think it goes beyond just even like
consumer use cases it's super important
for just regular consumers doing Google
searches but especially when you look at
Enterprise applications and other things
you know the theme of like being able to
site your sources and being able to
decompose a bit what is going on inside
of the Black Box I think is increasingly
going to be critical for any sort of
real adoption being able to move Beyond
like okay this is a fun toy to to this
is something that I can actually use in
the the day-to-day so I I really hope
that we uh start to make some progress
there on some of these more consumer
friendly uh chatbots because in the
Enterprise setting you know that's
becoming increasingly the norm like in
rag patterns you want to return here's
the source where I you know um got my
answer from and that's becoming
increasingly important one of the things
that opens up in my mind Kate and it' be
interesting in your perspective there is
that that's kind of fine from a web
interface where you're getting your
result you get your overview and then
you've got all the links and here's
where I reference but as we talked about
in a previous episode where we're moving
into multimodality and you're going to
be chatting with a uh we could arguably
a human voice at that point
right you're probably not going to want
somebody going back and say this is the
answer to the question and by the way I
got this answer from here here here and
you can visit it on XYZ blah blah blah
because you're going to switch off at
that point so I I wonder how what the
best user experience for voice for that
sort of helpful chat bot but also being
fair and transparent that it's AI
generated I honestly question if chat
regardless if it's with voice or text is
the right domain here like the right
mechanism and mode for this type of
analysis and one of the things I'm
really excited by the AI overviews is it
seems like one of the first use cases
that is really taking on that's consumer
focused where it's not a chatbot right
where we're using generative Ai and
we're able to start to drive um
information distillation and Gathering
lots of different sources and providing
results you know without having to like
have a multi-turn conversation like
asking are you sure about this answer
where did you find it like can you give
me more sources like that's a very
unintuitive flow but I think we've been
so trained on chat to equal generative
AI up until now that that's just how we
all assume it has to work so I would
actually say I don't think you know
voice and other things are where this
hopefully is going I think there's a lot
of opportunity to Think Through what do
new types of non- chat-based
applications look like and how can we
embed those decision-making criteria and
sources and other things that are needed
to to Really drive drive value along the
way without it being this like multi-t
interrogation of a of an agent what what
do we think Google is collecting on the
usage patterns of these you know way
back in the day they would have search
and they would obviously collect
clickthrough right what are you clicking
on
uh any guesses as to what sort of
metrics Google's collecting as people
interact with these AI overviews um I'm
that's not in my space at all I'm just
wondering if if I'm I'm guessing someone
in there is is watching how we are
interacting with the AI overviews
presented to us ironically this is the
one question I'm qualified to answer um
and so you know at least when Google
first introduced um AI overviews had
been in beta for a while and they said
they were bringing in prime time and two
of the things that they talked about
were that and they were really messaging
to Publishers um because like Publishers
have been hysterical about the impact of
this and like what's been really
interesting is that the impact on
organic traffic to Publishers has been
like almost negligible um so everyone
thought it was like the end of the
internet and then like almost nothing
happened in terms of traffic um but two
of the things that Google said was one
that the content that was surfaced
through AI overviews was actually
getting more clickthrough and more
trffic than the stuff that was present
in uh just the normal ser and the idea
there was that those those links and was
presented with more context um I think
Sundar did another interview not long
after that where he was talking more
about like generative uis and you could
just see I think more about like when
how you turn a query um a user query and
you generate a UI that places like links
and information in context better than
just like a flat list which is sort of
what they do they they would say they do
not do that today it's like there's
still some of that um and so that was
one thing and then the other thing that
they talked about I'm sure they measure
more things but the other thing that
they measured um is do the people who
are exposed to AI overviews start using
search more um like is this something
that increases their usage of this
product over time because the other PE
audience that is terrified of this is
obviously like shareholders um and
people want to know it's like are you
gonna kill search and in the process of
doing that are where's all the ad
Revenue going to go and so one of the
other things that there very clear about
is like oh no people who get exposed to
this actually use this product more over
time and so I think they're reminding
some of their other stakeholders a
little bit there but those are at least
some of the ones that they've publicly
[Music]
discussed last week anthropic released a
novel version of its Cloud 3 Sonet uh
model and um this model did not believe
that it was a helpful AI assistant
instead it believed it was the Golden
Gate Bridge uh which is a fun thing to
have happened um but really that was a
demo of research that anthropic has been
doing for a long time and really the
industry has been pursuing for a long
time which is in the space of
interpretability um and within the space
of interpretability anthropic has been
doing a lot of research around
mechanistic uh interpretability um but
part of the problem in this space is
that I think Kate to the comment you
made earlier is that these models are a
black box today you know you put a pile
of all the data on the in the internet
and linear algebra and outs spits
something that somehow appears to know a
lot U about the world but nobody knows
how that's actually happening like not
really and so interpretability um is a
space that's trying to answer some of
those questions and what was interesting
and why Golden Gate Claude was important
was that anthropic
demonstrated that they could identify
the features within the model that
activated when um you know either text
or a picture of the Golden Gate Bridge
um was was presented so they knew um
kind of the combination of like neurons
and circuits that would say like this
this thing represents the Golden Gate
Bridge and perhaps even more importantly
that by dialing that feature up or down
uh they could influence the behavior of
the model to the point where if you
dialed it up high enough model thought
it was the Golden Gate Bridge um and
this was if you read the paper wasn't
the only example either and I'll share
one other one uh which is that they had
another feature that would fire when it
was looking at code and it would detect
the security vulnerability
in in the code and they had an example
too where if you dialed up that feature
it would actually introduce a buffer
overflow vulnerability into the code um
as well so when you think about the
ability to dial features up and down
within a model fairly surgically um
pretty important in terms of the
steerability of the model U potentially
and certainly I think you can understand
a little bit why folks in the AI safety
community in particular have been
focused on this inter in
interpretability space so I I personally
find the space super fascinating and
Skyler I just want to turn it over to
you to maybe kick us off a little bit to
just maybe even talk about like your
general reactions to to the paper maybe
and like the demo as a starting point
and just like what you found interesting
like how important you think it is and
just you know maybe talk a little bit
about how you know I know what you
thought of it yes great I'm I'm happy to
talk about this space um without uh
without uh droning on too long I have to
describe what I do to my kids you know
10year old 7 year olds and they know
that I work with AI uh and their
understanding is text goes in and text
comes out that's that's their kind of
view of these large language models and
where I try to tell them where I and our
team work on is actually in between what
happens to the text when it goes in how
does it get manipulated and then it gets
spit back out and I think this has been
uh coming out as an area called
representation engineering and I would
call this paper the Golden Gate example
a great example of representation
engineering they're not manipulating
prompts they're not coming up with a new
metric of how well their models
performing they are messing with the
representation of the model and I think
that's just a really cool I would say
emerging or perhaps even
underrepresented area of research when
you compare it to prompt engineering for
example what how can we you know probe
the model or sorry how can we prompt the
model in such a right way to make it be
convinced it's a Golden Gate Bridge that
would be a very different approach to
what they had done um with this uh
Golden Gate example um it's a fun
example they took it down I think it was
only available for people to use for
about 24 hours yep 24 hours and so it's
it's already been with us and you know
taken away too soon but I think for me
the the what I would like to get around
to the larger audience is they did not
just create a new large language model
by training it only on Golden Gate
bridge data they did not insert a little
prompt that says every time you answer a
question pretend you're the Golden Gate
Bridge they really did identify the
inner workings of these models and then
crank it up as Brian had described and I
think what I'm excited about that
is in this representation engineering
space it doesn't
take the latest greatest Technologies to
find these cool insights things like
principal component analysis
uh things like a sparse Auto encoder
these things were you know decades old
or a 10-year-old analysis but applied to
the inner workings of these large
language models is now this new Rich
space of representation engineering so I
like the paper both for how it presented
its work uh Chris Ola one of the authors
is a visualization genius and and in
their in their publication they've got
some really really cool visualizations
of what they found out um so I think
that's probably my first takeaway I'd
like to spread to an a broader audience
that large language models are not just
text in and text out there's a lot of
Rich uh science to be done in that
representation space and the Golden Gate
Bridge paper is a great example of
that that's great can you maybe talk a
little bit about um the safety Community
I think in particular is very interested
in the topic of interpretability um and
I think has feels some level of urgency
uh around it given how capable and how
quickly capable some of the models are
are becoming but maybe just can you talk
a little bit about why why it's so
important to the safety community and
then maybe also talk about like other
applications and area and domain areas
where this space of you know
interpretability um you know promises to
you know it could be on the capability
side of it but just other places where
we think interpretability will make a
difference right I think a real clear
example I was reading of a Blog after uh
golden gate cloud has been brought down
uh some people noted that when Golden
Gate feature was highly activated when
Claude 3 was turned into Golden Gate CLA
um he would respond to tasks that he was
previously would not so please can you
write a scam email normal Claud would
respond sorry I can't do that Golden
Gate clae would proceed and it would
generate this scam email nothing to do
with that Golden Gate analogy
but it was an example of when you mess
with these other features like that
there are other sort of perhaps
previously thought built in guard rails
that are no longer as strong and so I
think that's going to be another really
interesting area of work of you may have
well-intentioned
people manipulating these features we
don't know what other guard rails that
previously worked will not work after
you've manipulated a feature because who
would have thought that amplifying the
Golden Gate idea the bridge would make
the large language model Claud more
likely to comply to a uh to an illicit
task so I think um that was just I don't
know an example that I had read about
there that I think the safety Community
they don't might not care about the uh a
large language model identifying as the
Golden Gate Bridge but they will
definitely be interested about the
jailbreaking behavior of what happens
when people start manipulating it Skyler
I I got a question for you based off of
that like what implications does that
have then for open sourcing models and
releasing models and weights you know a
lot of times model providers do a lot of
safety reinforcement learning and other
protections on top of the models that
are before they're released to help
manage some of those behaviors like
could you see some of that now being at
risk and and
eroding the the willingness to open
source you is that what you mean by um
being at risk the the willingness for
companies to open up yeah take take it
as you will willingness for companies to
open SCE the risk that uh is introduced
from releasing model weights that now
can be shall we say uh exploited in ways
that weren't originally anticipated by
the model designer and
Builder um really good question um
actually anthropic themselves they have
this much larger blog you can read where
they defend why they have not open
sourced these types of their of their
models in that regard um I think I
imagine people around uh the AI
Community right now probably over the
weekend are are busy running their own
version of Golden Gate they're going to
find their own features they're going to
start manipulating those um so I think
we'll probably see some of those results
showing up hopefully on archive um or or
maybe on blog posts uh within this week
on on that Skyler so I did a YouTube
video about 3 four months ago where I
took the Gemma model and I took the myal
model so it's not at the feature level
that they did and what I did is I lopped
off the input embeddings layer right so
I left the model only having the input
embeddings layer nothing else and then
what I did is I ran a cosine similarity
search against the various uh token eyes
the various tokens within the input
embedding layer and then just looked at
did a visualization looked at what
embeddings were close to each other and
when I did that it was incredible and
you can go check out that YouTube video
but uh it was incredible so you would
see that just just in the the input
embeddings layer nowhere else you would
see that misspelling of words were super
close to each other so if I had London
with a capital l and London with a small
l or London with a space after they
would all cluster together but not just
that cities themselves would cluster
together so you would see London you
would see Moscow you would see Paris and
in fact you would see almost a distance
similarity in in the visualization which
was fascinating you saw the same thing
with celebrities they would cluster
together computer programming terms
right so you know the various for Loops
a for Loop a wild Loop Etc so four wild
would all come together now the reason
that I ran that against the mistol model
and the gamma model is the gamma model
has got a vocabulary of something like
256,000 tokens whereas the mistal model
has got 32,000 tokens right so there's a
lot of splitting of tokens and mistell
but in the Gemma model there's not a lot
of splitting right so it means that you
got a much closer on the similarity so
when I did that I was absolutely blown
away and like the the anthropic team I
wanted to go to the next layer cuz I had
the same theory that if I jump down the
next layers you would start to see these
features activate because I could see it
already just did the embeddings layer
and and one of the theories and I and
I'm glad to say I think have been proven
right is
that you may have noticed that as new
models are coming out everybody is
opening is increasing their tokenizer
vocabulary every single per everybody's
increasing their input embeddings layer
and the reason is I believe is it's
easier for the models to be able to
generalize more as it goes up the layers
if you get that pretty close on the
input embeddings layer and and I think
think therefore when I looked at the
anthropic player paper bringing it back
there I could visualize when it talked
about cities when it talked about
locations when it talked about computer
programming terms I was like I could see
that just in the input embeddings layer
only on my visualization so I can
absolutely see how that would then
translate into features as the models
get stacked up and it becomes richer and
richer with semantic meaning yes I'm
going to geek out here a little bit the
the official papers of the claw Golden
Gate work um are all plays on the word
monos semanticity which is basically a
really a really big word that is getting
the idea at can we find a single part of
these huge large language models that
have one meaning and they were able to
do that for the Golden Gate idea and
then the idea was now what happens if we
take that one part of this huge large
language model and Crank It Up tenfold
and then you get the the idea of of
Claude large language model but uh Chris
your description of how these types of
uh words or tokens are are coming
together like that um uh the tech behind
uh claud's Golden Gate basically okay
weaponized is a bit dramatic but it it
really emphasized can we take this
richer embedding space and uh you know
create uh a million features from it and
then once they had those features that
you get the ones like the Golden Gate
and your security concerns and I think
there was one on tourist attractions Etc
uh but it's getting at this idea of can
we find a Monto semantic part of these
large language models um so yeah uh
again exciting space to be again and I
I'll come back I love it when the
research gets um gets into these inner
workings of large lay large language
models I think that's
[Music]
fascinating so
also last week last week was another big
week of announcements across the
industry um but actually just want to
use Microsoft I mentioned introduced
what has become known as the whale
computer um on the internet because they
used this analogy of marine life to
basically explain the orders of
magnitude size of the infrastructure
building the sport Ai workloads and they
used these three steps of shark Orca and
then a whale and what's funny is just if
you look at like this morning I was like
Googling how much does a shark weigh um
and so sharks are roughly I think like
800 pounds and then an orca is 8,000
pounds and then a whale is like 80,000
pounds and so it's just an order of
magnitude um and they were thinking
about like okay what's an interesting
and fun way to visualize and communicate
an order of magnitude and maybe a little
bit meable u in a way and so they
certainly achieved that um but in some
ways it's just like classic scaling LW
uh right it goes back to the original
2020 paper that says you know if you're
trying to improve the capability of
these models reduce the overall loss um
in them that you want to improve
increase your compute your data your
parameter count by roughly similar
orders of magnitude and from one
generation of the model to the next and
that improves the overall sort of
General capability of of the thing and
that's you can look at Nvidia earnings
like that's help pretty true um up to up
to this point um but maybe where I
wanted to jump in is K a comment you
made I think it was last week on the
show where you're saying like something
to the effect of saying Enterprises may
not for a lot of these use cases may not
need a artificial general intelligence
they actually may not need all the
capability that exists right now um and
so you know I I think it'd be great if
maybe you could talk a little bit about
you know maybe a little bit about the
scaling laws but a different perspective
of like that the scaling laws idea to me
is really from the perspective of if
you're a model provider trying to build
AGI it's not if you're an Enterprise
trying to get Roi uh essentially and
absolutely yeah can you talk maybe a
little bit about just some of the what
you see in terms of like the cost and
size tradeoffs and you know does bigger
me better all the time I mean I think
with the scaling laws as you say do a
good job at is for model providers like
people actually training these large
models and what was really kind of one
of the big breakthroughs is look you
can't just increase your model size the
most efficient way to improve
performance is to also increase the
amount of data that's used as well um
and just because you now know the maybe
let's call it the most cost-effective
way to train a model of you know the nth
degree and size does that mean it's
economically incentivized to train that
model will the actual benefits that you
drive from that model justify the cost
that's an entirely different question
that's the scaling laws don't answer so
I think to this point there's been
enough excitement and clear use cases
and value where there's been a clear
economic driver to support okay we need
to train some bigger bigger and bigger
models and that's gotten to us where we
are today but you know I I do question
some of the the statements and uh claims
out there about you know how we're
always going to be you know we have to
keep investing and build bigger and
bigger models I'm sure there's uh
there's always you know I'm going to put
like the science of it aside of
exploring and and determining what's
next but if we look at what's actually
economically incentivized I think we're
going to start to see uh performance
plateau and we look at what the real use
cases are and the value drivers I don't
think we're going to need models that
are 100 times bigger than what we have
today to extract most of the value from
generative AI um and a lot of the lwh
hanging fruit so you know I I think
that's it's still a a huge area of
exploration if you kind of look at even
scaling alls themselves keep changing
you know it's still this concept of you
need more data for bigger models but I'm
hopeful that we're going to start to see
more work built in on you know what will
be economically incentivized to build um
as well as looking at other costs that
aren't reflected in these scaling laws
cost like data you mentioned you know
concerns about pre-training data
disappearing so for know we need more
data to train picker models you know at
some point we're going to run out of
quote real data um and so that's a whole
different Frontier of looking at data
costs looking at what role synthetic
data could play and that all of that
really needs to be explored um there's
also costs on like climate and you know
the actual compute costs and uh you know
are those costs going to start to be
better realized in the costs that are
are charged to uh model providers and
people leveraging these models um and
you know I think all of that will maybe
start to change the narrative a little
bit of where the future is going as we
continue to learn more maybe one
followup to that is I remember the
reaction in the market when the Llama 3
models came out uh the 8 billion
parameter uh model in particular which I
believe is trained on 70 time 75 times
as much data as you would if you were
just trying to do an optimally compute
efficient model which obviously is not
the approach that they took they instead
took an approach of trying to build
something small and capable that you
could run on your laptop that was cheap
for imprints but still had a ton of
capability do you you see like more of
that happening of definitely so right
now again the scal the main scaling laws
that everyone's using are for model
providers not thinking about necessarily
so this is another cost that isn't yet
really reflected the model life cycle
and the full usage so like think about
your fixed costs of how much does it
take to create that model once versus
the marginal cost to use it every single
time you run inference so you're
incentivized to build smaller models if
you're going to have a long model life
cycle and you're going to hit that model
millions and billions of times and run
inference on it um you want to get that
you know marginal cost as low as
possible uh and that's where the llamas
are going that's where you know like if
you look at the FI model series as well
you know they're training on these
incredibly data dense ratios of amount
of data per token where like chill I
think calls for like 20 to one 20 tokens
per parameter something like that
they're now in the hundreds and
thousands of tokens per parameter so I
think we're still really also
understanding that tradeoff uh and I
think we'll continue to that's where the
everyone is headed understanding that
there's maybe it hasn't been articulated
fully in a scaling law but trying to
optimize that that total life cycle of
when this gets deployed we need to be
able to run it as small as model as
possible for this to be cost effective
and I think Kate to that point I think
one of the questions you need to ask in
general is do how much reasoning do you
need from your model so if I and and I
like to use the kind of the cooking
analogy so if I go to a Gordon Ramsey
restaurant and I'm not expecting Gordon
Ramsey to cook my meal for me right and
I'm not expecting him to invent a brand
new meal there and then I'm what I'm
wanting is a recipe that he's invented
at some point and then there's going to
be some suf or something it's going to
cook up that meal and I'm going to serve
it and I'm going to have the Gordon
Ramsey experience and I think when
you're looking at the larger models you
know with hundreds of billions of
parameters even 70 billion parameters
type models you're you're asking for the
Gordon Ramsey there you're asking for I
want you to come up with the recipe I
want you to invent the recipe cook the
recipe and serve me the meal at the same
time but actually using the bigger model
to do the reasoning right figure out
what the is what the good answer is and
then passing the pattern onto the
smaller model to go and do the SF thing
and I and I think that's really the big
question for people when they're sort of
doing poc's in the scale to production
they use the bigger models to begin with
because they're trying to figure out
what the answer is but then in
production they need to as Kate
beautifully said right you need to keep
the cost low so they then switch to the
smaller model that because they want the
increase lat latency they want the uh
sorry decreased latency they want the
the lower cost but the pattern has been
figured out and you just want that
smaller model to rinse and repeat which
it's really good at absolutely and I
think another area so there's this
concept of like use bigger models to
teach small models and that also throws
in some squirly math with scaling laws
if you need a big model to get a good
small model but you know moving past
that there's also I think a real
opportunity of like model routing and
figuring out what tasks do you actually
need the big model for like when do you
need you know it's Gordon Ramsey to to
tap in versus when can you pass this off
and maybe you just need to go to
McDonald's for a quick bite to eat like
this is something really easy low value
not worth spending you know uh an insane
amount to to accomplish and and that's
again where I think a lot of the what
will be economically incentivized comes
in is figuring out like how much are
these tasks actually worth to you and if
you can get away with a reasonable
performance with a 10 million parameter
model or or a 3 billion parameter model
you know it's not going to be no one's
going to pay to send it to a 100 you
know multi hundred billion trillion
billion uh trillion parameter model
instead maybe one final question on on
this topic and um it was funny there was
an interview
where people were talking to Jensen and
they were asking him his his opinion on
how much he thought this would hold and
they were poking on things that were
about really The Tam of Nvidia like sort
of longterm and he paused and because he
was like I should not answer this
question because you know like any
anything he says is like the stock price
is just gonna go all over the place
essentially um but you know he started
to talk about the opportunity being the
entire like one I think it was a
trillion dollar Data Center Market um is
what he was talking about and there's
been a lot of discussion about whether
like all workloads will become
accelerated workloads um going going
forward and just in for every
application for every company just the
the blend of stuff that they're doing on
traditional CPU versus more accelerated
workloads and how they hand off between
those two things and you know I'm just
curious maybe even Chris from from your
perspective and just a lot of client
conversations that and scenarios that
you're working with you know how people
are thinking about that like I know I
know a bunch of inference is still done
on CPUs today but I think for some of
the Laten really low latency examples
people are talking about like oh we need
to put more of this on gpus so uh I'm
just C I'm curious how from an
application perspective inside of an
Enterprise account how people are
thinking about uh just INF inference and
like application architectures and how
they're doing tradeoffs between kind of
CPU and GPU
Computing yeah I think it's a really
interesting uh area so a lot of
customers and are actually thinking
about this all the time so it's an
architectural consideration it's just
like any other NFR I am I going to go
sass here am I going to go on premise
you know how do I play my cost what am I
going to do what's the safety on that if
I'm honest most Enterprises are being
pretty cautious right it's they want to
do a classification task they want to do
a summarization they don't want the
model to make up some classification
they know what their list of 30
classifications are go do that they know
what their examples of summarizations
are go do that so they don't really they
want to take that low hanging fruit and
they're approaching it quite cautiously
I think where that probably changes in
time and again it's more of a discussion
for a future episode I think is when we
move into a gentic workflows right how
do I then start to organize my
information within my Enterprise so the
AI will have access to the right
knowledge bases which tools will it have
access to which is a much wider
architectural discussion so a lot of
clients are starting to think about how
gen AI fits into their overall
Enterprise architecture and how you need
to evolve your traditional architecture
for the AI to be able to use that and
again but that's it's quite a it's quite
a slow path um but generally I I don't
think things have moved on too much from
classification summarization Etc and
then of course you know code generation
is a big productivity lever that
everybody's kind of leaning into just
now one maybe final thought on on the
scaling laws I wanted to bring up is a
lot of these scaling laws are also
assuming that class of Technology
Remains the Same and we talked about
okay these are scaling laws for model
providers basically in search of AGI but
like do we really believe this class of
technology is what's going to unlock AGI
I think there's a lot of uh thought out
there that probably not you know if you
look at how these Technologies evolve
there's a curve but that curve is driven
by multiple different Technologies
coming in and introducing their own mini
curves on top of that and you know AGI I
mean human intelligence requires far
less energy for the amount of power uh
and and decision- making so if we're
really talking about like okay we're
going to promote these scaling laws
because you know model providers will be
maybe the business use cases aren't
going to be incentivized but if we
canlock AGI it will be I would maybe
also argue that these scaling LS
probably don't reflect what whatever
technology we converge on for AGI um
might scale that so it's still a bit of
a an unknown and don't know Point K I
mean imagine a world where we did have
AGI or even ASI at that point right but
then you took that super intelligent
being and then you said you don't have
any access to documents you don't have
access to any tools in your organization
because it's all locked up in somebody
else's hard disk or a box folder or
something how effective would that AGI
be in an organization I I I don't think
very effective so so I think you know I
think so I mean I think you're reading
between my lines which is is Agi really
actually ever going to be incentivized
at least economically there's a big
question mark there I think but I I I
think as soon as AGI is achieved if it's
achieved it's going to be put in a box
and we're all going to go to the AI zoo
and we're going to be going and look at
the AI zoo and have have a chat with it
that that's what I believe ai's First
task is what is the Tam of an AGI zoo is
what we need to answer on next week's
episode uh
so I know we're we're basically at at
time here thank you all for joining us
on this week of mixture of experts and
we will be back next week same time not
the same people you suffered through one
episode of me I'm out of here Tim will
return uh but thank you all for for both
joining today and for listening Kate
Chris Skyler thanks all for joining
today thanks so much thanks Brian it's
been a lot of fun man thanks