Peak Pre‑Training and Synthetic Data
Key Points
- Ilya Sutskever’s keynote at NeurIPS proclaimed that we have hit “peak pre‑training,” suggesting future AI advances will require alternatives beyond larger pre‑trained models.
- Vagner Santana warned that synthetic, AI‑generated data is already flooding the web and, without reliable detection tools, we may unknowingly be training new models on content that itself was produced by LLMs.
- Volkmar Uhlig cautioned that it may still take a few years before the industry fully transitions away from heavy reliance on pre‑training, despite growing interest in other techniques.
- Abraham Daniels, a Mixture‑of‑Experts (MoE) specialist, noted that while MoE may become less central over time, it remains an important piece of the evolving AI toolbox.
- The episode also previewed upcoming topics such as Granite’s latest release, novel model‑theft attacks, and NVIDIA’s ultra‑compact supercomputer, framing them within the broader MoE discussion.
Sections
- Debating the End of Pre‑Training - Panelists discuss whether AI pre‑training has peaked, covering synthetic data detection, Mixture‑of‑Experts trends, and upcoming NeurIPS insights.
- Shifting From Pre‑Training to Test‑Time Compute - The speaker describes how their firm leverages partner‑sourced proprietary domain data and is transitioning toward inference‑time (test‑time) computation, reducing reliance on static, large‑scale pre‑training.
- Filtering Training Data with LLMs - The speaker argues that massive, noisy internet data must be vetted using large language models and test‑time selection mechanisms to separate truth from garbage during pre‑training.
- Feedback Loop of Synthetic Data Bias - The speakers discuss how reusing LLM‑generated data for pre‑training can perpetuate existing biases, highlighting the difficulty of assessing data quality and the lack of reliable methods to detect synthetic origins.
- Granite Guardian & Embedding Model Release - The speaker details the launch of Granite Guardian 3.1 for hallucination detection, new multilingual embedding models for semantic search, their availability on Hugging Face, Watsonx and partner platforms, and previews future MOE scaling and multimodal capabilities.
- Balancing Openness and Safety - The speaker argues that open‑source AI models can simultaneously provide transparency and security, citing community‑driven bug‑fixes and guardrails like Granite Guardian and Llamaguard as evidence.
- Model Prompt Tuning Creates Vendor Lock‑In - The speaker explains that extensive prompt engineering is specific to a given model family and cannot be transferred to others, creating strong lock‑in to those models while compute resources remain easily switchable across cloud providers.
- Future Prompt Optimization & Model Exfiltration Attack - The speakers debate whether advances will render prompting obsolete as models self‑optimize, then discuss a recent side‑channel attack that extracts AI models by monitoring TPU hardware activity.
- Assessing Practicality of AI Side‑Channel Attacks - The speakers debate the real-world threat of side‑channel techniques—like acoustic keyboard eavesdropping—to AI infrastructure, concluding that the valuable asset is the model’s weights rather than its architecture, and that established security measures (e.g., cryptography) largely mitigate such risks.
- Securing Data and Model Assets - The speaker discusses the need for comprehensive encryption, uniform adoption, and compliance to protect both data and AI model assets within enterprise ecosystems.
- Internal Threats to Model Infrastructure - The speaker discusses a recent article revealing a vulnerability in edge‑inference TPU deployments, emphasizing the need for infrastructure providers to implement stronger guardrails against insider attacks on open‑source LLM models.
- NVIDIA Low‑Power Robotics Board Overview - The speaker outlines NVIDIA’s long‑standing autonomous‑vehicle investment and details a circa‑2015/16 low‑power robotics board optimized for vision, mapping, and inference that lets developers train models on NVIDIA GPUs and seamlessly deploy them for on‑device processing without draining robot batteries.
- Making Petaflop Computing Accessible - The speaker highlights how George Hotz's low‑cost petaflop hardware could democratize AI by dropping price barriers, enabling innovators in both wealthy and developing regions to experiment with advanced applications such as robotics, architecture, and agriculture.
Full Transcript
# Peak Pre‑Training and Synthetic Data **Source:** [https://www.youtube.com/watch?v=GnMKY4QLHDw](https://www.youtube.com/watch?v=GnMKY4QLHDw) **Duration:** 00:40:27 ## Summary - Ilya Sutskever’s keynote at NeurIPS proclaimed that we have hit “peak pre‑training,” suggesting future AI advances will require alternatives beyond larger pre‑trained models. - Vagner Santana warned that synthetic, AI‑generated data is already flooding the web and, without reliable detection tools, we may unknowingly be training new models on content that itself was produced by LLMs. - Volkmar Uhlig cautioned that it may still take a few years before the industry fully transitions away from heavy reliance on pre‑training, despite growing interest in other techniques. - Abraham Daniels, a Mixture‑of‑Experts (MoE) specialist, noted that while MoE may become less central over time, it remains an important piece of the evolving AI toolbox. - The episode also previewed upcoming topics such as Granite’s latest release, novel model‑theft attacks, and NVIDIA’s ultra‑compact supercomputer, framing them within the broader MoE discussion. ## Sections - [00:00:00](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=0s) **Debating the End of Pre‑Training** - Panelists discuss whether AI pre‑training has peaked, covering synthetic data detection, Mixture‑of‑Experts trends, and upcoming NeurIPS insights. - [00:03:06](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=186s) **Shifting From Pre‑Training to Test‑Time Compute** - The speaker describes how their firm leverages partner‑sourced proprietary domain data and is transitioning toward inference‑time (test‑time) computation, reducing reliance on static, large‑scale pre‑training. - [00:06:11](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=371s) **Filtering Training Data with LLMs** - The speaker argues that massive, noisy internet data must be vetted using large language models and test‑time selection mechanisms to separate truth from garbage during pre‑training. - [00:09:24](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=564s) **Feedback Loop of Synthetic Data Bias** - The speakers discuss how reusing LLM‑generated data for pre‑training can perpetuate existing biases, highlighting the difficulty of assessing data quality and the lack of reliable methods to detect synthetic origins. - [00:12:31](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=751s) **Granite Guardian & Embedding Model Release** - The speaker details the launch of Granite Guardian 3.1 for hallucination detection, new multilingual embedding models for semantic search, their availability on Hugging Face, Watsonx and partner platforms, and previews future MOE scaling and multimodal capabilities. - [00:15:40](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=940s) **Balancing Openness and Safety** - The speaker argues that open‑source AI models can simultaneously provide transparency and security, citing community‑driven bug‑fixes and guardrails like Granite Guardian and Llamaguard as evidence. - [00:18:47](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=1127s) **Model Prompt Tuning Creates Vendor Lock‑In** - The speaker explains that extensive prompt engineering is specific to a given model family and cannot be transferred to others, creating strong lock‑in to those models while compute resources remain easily switchable across cloud providers. - [00:21:56](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=1316s) **Future Prompt Optimization & Model Exfiltration Attack** - The speakers debate whether advances will render prompting obsolete as models self‑optimize, then discuss a recent side‑channel attack that extracts AI models by monitoring TPU hardware activity. - [00:25:07](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=1507s) **Assessing Practicality of AI Side‑Channel Attacks** - The speakers debate the real-world threat of side‑channel techniques—like acoustic keyboard eavesdropping—to AI infrastructure, concluding that the valuable asset is the model’s weights rather than its architecture, and that established security measures (e.g., cryptography) largely mitigate such risks. - [00:28:15](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=1695s) **Securing Data and Model Assets** - The speaker discusses the need for comprehensive encryption, uniform adoption, and compliance to protect both data and AI model assets within enterprise ecosystems. - [00:31:18](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=1878s) **Internal Threats to Model Infrastructure** - The speaker discusses a recent article revealing a vulnerability in edge‑inference TPU deployments, emphasizing the need for infrastructure providers to implement stronger guardrails against insider attacks on open‑source LLM models. - [00:34:26](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=2066s) **NVIDIA Low‑Power Robotics Board Overview** - The speaker outlines NVIDIA’s long‑standing autonomous‑vehicle investment and details a circa‑2015/16 low‑power robotics board optimized for vision, mapping, and inference that lets developers train models on NVIDIA GPUs and seamlessly deploy them for on‑device processing without draining robot batteries. - [00:37:29](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=2249s) **Making Petaflop Computing Accessible** - The speaker highlights how George Hotz's low‑cost petaflop hardware could democratize AI by dropping price barriers, enabling innovators in both wealthy and developing regions to experiment with advanced applications such as robotics, architecture, and agriculture. ## Full Transcript
Are we at peak pre training Vagner Santana
is a staff research scientist and master
inventor on the responsible tech team Vagner.
Welcome back to the show.
What say you
given that we have, or we don't have
methods for detecting synthetic data,
maybe the biggest read in the past, right?
Because Now people are realizing,
and maybe we've done that already.
Volkmar Uhlig is Vice President
AI Infrastructure Portfolio Lead.
Uh, Volkmar, how about you?
I think we need to give it a couple more years.
And Abraham Daniels joining us for
the very first time is a Senior
Technical Product Manager on Granite.
Abraham, welcome to the show.
You're now an expert on Mixture of Experts.
Uh, tell us what you think.
I wouldn't say that it's over.
I guess I'm 100 percent sure that it's over, but
I think we're less reliant on it going forward.
All right.
Awesome.
All that and more on today's Mixture of Experts.
I'm Tim Huang and welcome to Mixture of Experts.
Each week, MOE is dedicated to bringing
the breaking news and analysis you
need to understand what's going on in
the world of artificial intelligence.
Today is another jam packed episode.
We're going to talk about the latest release
out of Granite, weird ways of stealing
models and NVIDIA's tiny supercomputer.
But first, let's talk a
little bit about pre training.
As many of you may know, uh, this
is the kind of week of the big
machine learning conference, NeurIPS.
Um, Ilya Sutskever, a big, actually
prominent thinker and kind of intellectual
and entrepreneur in the AI space, uh,
gave a keynote talk in which he claimed
that we are at peak pre training.
pre training.
Essentially pre training is over and to create
improvements on AI going forwards, we're going
to have to employ a bunch of different methods.
Um, and I guess maybe Vagner , I'll start with
you because I think you, you nodded towards
kind of one solution that Ilya mentioned
in his talk, which is synthetic data.
Um, are you ultimately really optimistic?
I mean, it's kind of sounds like you've got.
kind of almost a paranoid view that we might
already be living in synthetic data land.
But, uh, tell us more about why synthetic
data might be a way forwards if we think
that we're literally running out of the
data that we need to do pretraining.
I, in fact, I, I'm concerned
about it because, uh, when we see,
um, like the projections of how.
Synthetic data is populating the web, uh, we
are, and we don't have methods for actually
detecting and doing pertaining saying, okay,
this is synthetic, this is not synthetic.
So, um, that is why I answer, uh, the, the, the
way, because we, um, may already been, uh, pre
training models with content generated by LLMs.
Because we don't have a way to properly
filter when, uh, or good methods for
properly filtering when, uh, something
was generated by an LLM or not.
So that's, that's my concern.
So we, we may be living already, uh, in the
world that, uh, Uh, pre training is happening
or he's using data generated by LLMs.
Abraham, maybe I'll try to turn it to you.
Cause I know you work on the granite team.
Um, I think one really interesting kind
of aspect of all this is that a lot of the
data that we've relied on for pre training
in the past has been open data, right?
People use common crawl and what have you
to kind of like pre train their models.
And I guess part of the idea here is
that like in this world where basically
all the data is available, um, there's.
And being used basically proprietary data
becomes a lot more valuable and I'm curious
about how you think about that at Granite
I mean, I know, you know IBM's taken a
real strand stand towards openness And I'm
wondering if that also applies to how you
guys think about the data sets that are
gonna become more and more important here
Yeah. So great question.
Um, in terms of data, I think we've done
a lot of work in terms of partnering
with third parties to be able to backfill
some of the more specific like domain or
enterprise data that's key to our not only
our models, but our commercial road map.
So I think that's going to be like a really
big pillar for us is, you know, where can we
find data that may not necessarily be open
source, but that is kind of central to our
road map, focusing on domain specific models.
But kind of back to what I said
earlier in terms of where pre training
might not be over, but I think we're.
There's a shifting paradigm in terms of
how we inference our models, um, more
specifically what's called test time compute.
And you're starting to see this with some of
the newer models that came out, um, chat GPT
o1 as well as Quinn where, um, you know, we're
less reliant on the static knowledge that we
get as part of our, you know, pre training
and really focusing on, you know, how do
we make models more capable at inferencing?
Having them think a little bit more about
their answers as opposed to using a system
one thinking where, you know, the first
answer they get is what they respond with.
Yeah. And I think that's a really interesting approach
is almost kind of like the, the real action
now is almost, data lists in some ways, right?
It assumes the pre training happens, and
then all the real optimization is going to
be like doing these kind of inference tricks,
which is going to be a very, very different
way of thinking about some of this stuff.
Yeah, it's a lot more system two thinking.
So reflecting on your answer, having the
ability to go back, change your answer, or,
uh, you know, better understand if there
was a misstep in your thought process.
Volkmar, maybe I'll bring
you into this discussion.
We've had you on a couple
of shows at this point.
I think I'm starting to get the Volkmar
Vibe on how you answer questions.
I don't know if I'm reading too
much into what you said, where you
said, Well, we got a few more years.
Do you think this is just hype?
This is just Ilya doing his thought leading?
Like, we're not really at peak pre training.
You know, this is like, it's, he's
describing a trend, but it's probably
more hyped than anything else.
So I think what he's touching on is,
and this has been a trend over the last
couple of years, that, um, the amount
of information you can get in the open,
um, in fact we downloaded everything and
we trained everything into the models.
And so we are now at a point where, you
know, uh, and I'm going to mention that, you
know, like what, how do you differentiate
data which is computer generated versus
what is actually human generated?
And that's the fundamental question, and so.
First of all, we assume that human generated
data is good and machine generated data is
maybe good, and I think that is not true, right?
So human generated data is misleading and
is wrong and we're gonna go if you just
crawl the internet What do you know, right?
You just download some stuff and people make
stuff up and you train that into the model
and you declare that's reality So I think we
are at a point Abraham touched on this, and
I'm a strong believer in the system one system
two thing, um, where we need to actually
test the data which we download from the
internet, or we just synthetically generate it.
And that's kind of like test time
compute, uh, behind it, right?
So I spit out a bunch of answers, and
some of them are wrong, some of them
are right, and I pick the right one.
So I can apply the same mechanism to
stuff I download from the internet.
And now it doesn't matter anymore.
If it's, you know, human generated and we kind
of assumed humans, you know, generate good data.
And so anything we could download was good.
I think we are now in a world where we can
produce so much bad data at a really high
velocity, uh, that we actually, when we are
creating those pre training data corpora,
uh, we need to actually go through the data
sifted and actually like classified as,
you know, this is garbage and this is true.
And I think the only way to do that is actually
using large language models and the same
thing we are using for test time compute where
we are, you know, picking the right answer.
We will probably have to apply
to the data corpus we train in.
I think still the pre training, you know,
is the system too where, you know, I don't
think for a long time, but I just Recall, and
I think in any case, we will have to build
that core model and the model architectures
change and how we are training changes.
And so I don't think pre training per
se goes away, but I think that the focus
will shift the pre training as the basis
so that you know something and then you
can apply it in the system one type.
You know, where you're
spending more time thinking.
Yeah, just like it gets you
to the first base, basically.
But like going further is going to
depend on all these other techniques.
Correct. I think
the further is the hard one.
And this is where I think
reinforcement comes in.
You know, we are actually
discovering new knowledge, and we
will publish that new knowledge.
And I think new knowledge will not
necessarily come from humans all the
time, but it will come from models.
Yeah, I love this observation that like,
You know, in some ways, the gains from
just having more data have been so strong
for so long that like the incentive
has just been like, dump more data.
We really don't care.
We don't really look too closely because like
the more data we put into this magic machine,
it just gets better and better and better.
I guess kind of you're saying is
that the meta is kind of shifting now
that we are concerned about quality.
And then I think the nuance you're adding
to it was very interesting is that like that
synthetic data may very well be much more
high quality than what we get out of humans,
which I think is an interesting outcome.
Um, I guess, Vagner, how
do you think about that?
Because I think in some ways, like from
the point of view of like a responsible
AI or responsible tech ethics person, You
know, I think that the, the bias has always
been towards human generated content.
We say, oh, well, we don't trust the
synthetic generated content because
who knows it might embed all this bias
and it might have all these problems.
Do you kind of buy what Volkmar is saying?
Like that actually, like we are now rapidly
approaching this point where Hey, our kind of
long standing, you know, kind of like prejudice
in favor of human data is actually misplaced.
My concern is is that, uh, if we think
about the most popular LLM used now
or for, uh, or by people to generate
content, could be OpenAI's ChatGPT, right?
And, uh, if we have this data And I, I, I
agree with Volkmar in terms of the quality.
It's, it's difficult, it's difficult
to assess quality of data, human
generated or synthetic, uh, data.
Um, but imagine that we have a lot of
people using Chat2PT to create data and then
this data is used again, uh, to pre train.
CHPT version plus one, right?
So now the biases that he had the data had
before are coming in again to the model.
So my concern is that this, uh, auto, uh, uh,
like this feedback of bias, considering that,
uh, uh, We don't have yet good, uh, methods for
detecting when the data was generated by an LLM.
I think that that's my concern, how to prevent
that the bias that a previous version has
to come in to the next version of, of, uh,
any LLM, um, Pre training on on these data.
Yeah, for sure.
I mean, I think this goes to the question,
which is like, I guess quality is a little
bit under theorized in this context, right?
Like, what do we actually mean by better
quality here and against what types of tasks?
I guess, Vagner, you've got certain
concerns, right, around this data.
You know, I guess if the main use
case is like, I don't know, math
or something like that, right?
Like, we may actually say that the
synthetic data is actually better, but it
really depends on use case in some ways.
I'm going to move us to, uh, our
next topic, I think, uh, we had a big
launch this week, which is Granite 3.1, uh, Abraham, you're on the show
in part because you're from the team.
Uh, do you want to kind of, uh, tell
our listeners, I guess, what's, uh,
what's coming out and what people
should be paying attention to?
Yeah,
actually, what's already come out.
So as of yesterday, we released Granite 3.1, our latest model in the series, in
the Granite series, family of models.
So it's built on top of Granite 3.0.
And as part of that release, we've
pushed out our Granite 8b Instruct
as well as our Granite 2b Instruct.
Our Granite 8b is really kind of our
workhorse model for, you know, 80 percent
90 percent of use cases enterprise as
well as any sort of specific domain cases.
What we're really excited about in terms of
our workforce model is that we know we've
seen great improvements in instruction
following as well as multi step reasoning.
Um, along with our granite eight being to be
dense models, we've released are a suite of M.O.E
Or mixture of expert models.
Um, these come in one B or one
billion as well as three billion.
And these are really focused on resource
constraint environments, any sort of low latency
applications, edge computing, which we'll talk
a little bit about in terms of the NVIDIA.
Um, And, and the big kind of
release or the capability that we're
launching with the Growth Granite 3.1 is, uh, we now support,
uh, 128 context length.
Um, so what that really means is we
can, you know, uh, input, uh, larger,
uh, you know, tokens into the model.
So what that supports is, you know,
long, uh, documents or multiple, multiple
documents to support QA, um, as well as
any sort of, you know, uh, code bases.
So you can float in a full.
Code base repository.
Um, and it also lends to more,
uh, LLM powered autonomous agents.
Um, along with our language models, we've
released our granite guardian series.
So these are our guardian models, um,
that support detection across a number of
different, uh, biases and, uh, hallucinations.
So specifically, the, the most
recent Granite Guardian 3.1 supports function calling hallucinations.
So again, a great feature for agentic workflows.
Um, and then lastly, as part of our release, we
have pushed out our Granite embedding models.
So these are efficient, you know, robust
models that support semantic search.
Um, they've come in four sizes, Uh, is
cross English and language and in terms
of language, we support all 12 languages
included as part of our language model.
So we're super excited to
have them out in the market.
Um, they can be found on Hugging Face
as well as our watsonx platform.
Um, we also have them, uh, on, uh, a
number of different partner platforms.
So Olama, Replicate, uh, they'll
be pushed to NVIDIA as well.
Uh, and we're just really
excited for, for what's to come.
And we're looking forward to, you know,
scaling out our MOE models in 2025, as well
as introducing some multimodal capabilities.
Yeah, that's awesome.
So a lot there, obviously, to go through.
I mean, I think maybe the first thing
I'll kind of bring you back to talk
a little bit about is context window.
I know that was a big part of the launch,
a big part that like, you know, kind of IBM
is I think touting as part of this release.
Yeah.
Can you paint a little bit more of a picture
of kind of like what this means for, you know,
again, like enterprise customers, like what does
a long context window actually mean in practice?
So, uh, it's basically how many words you
can input as part of your model inferencing.
So our initial models were 4k, and there's
really no one to one in terms of, you know,
tokens to words, but we'd say about 1.5.
If you will, so let's, uh, 128k
context, uh, length would be about
300 pages, you know, give or take.
And what that really means is part of
that is now you can ingest, you know,
multiple documents, um, that supports any
sort of particular QA or legal documents,
anything that spans multiple pages or
multiple, you know, corpuses of information.
Um, and it opens up a couple
different capabilities for users.
Um, again, specifically, more specifically,
LLM support, uh, sorry, uh, agent support,
which is, uh, you know, uh, prominent, kind
of like the buzzword right now is agents.
Um, otherwise, it also supports, um, you know,
new possibilities around repository level code
understanding, um, as well as self reflection.
So we kind of talked about it a little bit
where you can now start to ask your models
to reflect on the input or the output and
start to have that little bit of system
two thinking where it can start to, you
know, better understand its answers and
potentially shift its answers if necessary.
Right, for sure.
Vagner, maybe one thing I can bring you to
talk a little bit about here is, I think
the models that are focused specifically
on safety are pretty interesting here.
Um, I think for a very long time, I think
one of the concerns, just to put it out there
around open models, is well, they're going to
be used for all sorts of bad purposes, right?
And I think one of the really interesting
questions has been, can you achieve openness
and all the things that we like out of openness?
While still ensuring kind of like
safety in the model ecosystem.
And I take it that these models,
uh, that are safety focused are
kind of like an attempt to do so.
I guess my question for you is like, do
you think that we're, we're on track,
like that eventually we will be able to
have our cake and eat it too, to get the
openness and the safety at the same time?
I think so.
And if we compare even the topics that
we discussed in the last episodes, always
when we talk about certain, uh, attacks
that we discovered, they are mostly
connected to, um, proprietary models, right?
Because when we see that happening
for open models, People contribute
and people try to fix it.
We have a, a, a a community around these assets.
So I think that, uh, in that sense,
I think that the open source strategy
makes a lot of sense in my opinion.
Uh, and, and it, it's a interesting way
and, and the way that, uh, for instance,
for granted the granite guardian is, uh,
uh, structured like, uh, uh, also from the
very fir as a first, uh, barrier in terms of
what is being sent to the model, uh, right.
Uh, working on the prompt.
Trump level and also after generation, I
think that that's a good strategy also.
And we see that also in other open platforms
like, like in Llamaguard, we, they have like,
uh, uh, um, an open source model to detect
also these, these types of, of harms, right?
And I think that, um, again, when we see
new ways of attacking as the ones that we
discussed a few episodes ago, they are because
we don't know a lot about, uh, architecture,
about the code, about the, the, the, um,
flow of information and generation and prompt
how prompt, uh, is, uh, uh, prompts are
going in and, uh, the outcome is coming out.
So that, that's, uh, I think that, again,
Open source is a good approach to tackle this.
That's right.
Yeah. So final element I want to kind of touch on
before we move to the next topic is I think, you
know, Volkmar, you work on AI infrastructure.
Um, and I think one of the observations we
talked about a few episodes ago about all
the announcements Amazon was making was
there's sort of one interpretation that in
the future, you know, like infrastructure
wins because kind of like the models become
more, more commodity over time, right?
Like someone could say, Oh, for a few
months, I want to try out the IBM model.
I'm now going to try out the Llama model.
I'm going to go back to the IBM model.
Like we're living in a world where it
seems like increasingly the models will
be things that we kind of like switch in,
switch out, you know, kind of at will.
I don't know if you agree with that's kind
of like how the future will look or if it
really will be, you know, in the future, a
customer will say, Oh, we're going to build
entirely on, you know, the IBM model stack.
And that will be kind of the future.
It's kind of a question I guess for
you about like how much software
will become like the key platform.
Or if it really will be actually more
kind of like an infrastructure thing
where like we build on AWS and sort of
the models are very interchangeable.
So I think that from just from
experience of, you know, switching from
one, one, one model family to another
model family, uh, it's very hard.
So I think there is actually a substantial
lock in into a specific model and it's
primarily because of prompt tuning.
And, uh, we've seen just by doing better
prompt tuning for a specific model family.
Um, you know like a 30 40 improvement in
accuracy sometimes like 60 Um, and then when you
did that and you spent, you know, three months
of your engineering team You know getting your
prompts correct and you switch to a different
model family Um all that Prompt tuning work
actually is not transferable and so I think
there is a it's almost like you know You you
are betting on a particular programming language
and then you're saying oh, you know programming
language so programming language So I can switch
from Java to Python to C sharp and it's just you
know You just need to revite it a little bit.
I think there is actually a quite
substantial lock in into the models.
And so I, I think what will happen is
that the location where the computation
happens, that's totally commoditized.
So if you're on Amazon, you run on Amazon.
And if you want to run it in, you know, Google
Cloud or IBM Cloud, then that's what you do.
Um, I think that the model families
have a much higher stickiness.
It was like, you know, the way, you know,
the skills and knowledge get trained
and, uh, it's not directly transferable.
Yeah, I don't know if I
necessarily agree with that.
I think, I think the past
you're 100 percent correct.
There's been kind of like a moat around
specific models given the prompt template,
but I think with things like, you know, the
model context protocol that Anthropic released
that again supports Anthropic but is bound to
be an open source tool and kind of what we're
being like what we're building or what, you
know, the communities building across agents
where you have to be able to rely on multiple
models given inference costs and capabilities,
you know, within your particular workflow.
I think right now you're, I think it's
accurate, but I think going forward model
developers are going to be handcuffing
themselves if they try to build.
You know, an infrastructure around only using
their models to support use cases where it's
going to be more so what is the plumbing
going to look like in order to be able to
have interconnectedness with LLMs, given the
particular use case or agent, but I mean,
that's, that's kind of, that's just my opinion.
Do you think that this isn't, it's
a, it's a counter reaction to.
Um to model vendor lock in because
that's usually what you see, right?
It's like, oh we have a cloud here
and we build an abstraction layer
Yeah, no, that's fair.
I just I don't think I feel like model
development became a trying to lock in and
then this democratization or this race to the
bottom in terms of uh model development has
It almost hinders, you know, model developers
if they don't allow their specific models
or their ecosystem to play well with others.
I think that's where you're seeing the
proliferation of open source models.
Um, it's, it's driving a community
given that the community decides, you
know, what is good and what is bad.
And from a commercial standpoint, it's,
it's more so how do we make money?
But I don't necessarily, again, my personal
views, I don't think going forward, model
developers are going to, um, They're
gonna be able to lock people in and
still have the adoption that they want.
Yeah. I think, I don't know.
It's gonna be really interesting
to see how it plays out.
I know a lot of like engineer kind of types
that I know are like, oh, well in the future,
like there's not gonna be much prompting,
you know, we'll just run an optimizer
and the prompt will be perfect for it.
Yeah. And it won't really matter.
But, and, and I don't know, I also
know people who work, you know, in
this stuff who's just like, it's very
hard to imagine that we're gonna get.
That level of optimization where
effectively the models become commodity
because you can always optimize I'm
gonna move us on to our next topic So really
sort of interesting news came out this week on
a new type of model exfiltration type attack
And I think Vagner you flagged this for us.
It was a super fun story because normally
we talk about, you know Prompt hacking
and how do we do stuff from just like
the inputs and outputs of the model?
You But this is great because I think this
is the type of attack that you see from time
to time, which is a side channel, right?
Which is we're going to just monitor
a TPU chip, uh, as it does its thing.
And then from that, we're going to extract
all the intel we need to reproduce your model.
Um, so pretty interesting.
And I guess, Vagner, I mean, you
know, you're the one who flagged it.
Like what, what did you find most
interesting about this story?
When I, uh, read the story, I I thought
that the interesting aspect, and I started
thinking about the money that, uh, and all
the resources that, uh, take to train a
model and to deploy to a TPU, for instance.
And then, uh, uh, these researchers,
they're, they, uh, found a way to,
uh, use electromagnetic field to do
a reverse engineering of the layers.
Of the model that are deployed to that
and, uh, they do that by comparing
with a, uh, a data set they have of
over 5, 000, uh, layer architectures.
And so by trying like, like layer by
layer, they can, uh, mimic one model
to a different TPU, right, for, uh,
with, uh, 99 percent of accuracy.
So that, that was, that caught my attention
and say, whoa, that's something that tells.
a lot about how certain, um,
strategies, uh, may be at risk in
terms of these, this kind of attack.
Um, and again, I'm advocating open source
again because that if you have open source
and then that will be a way to, to be less,
uh, susceptible to this kind of attack.
But, uh, yeah, that, that's what caught my
attention in terms of how And last month,
I think I saw also, uh, uh, an attack that,
uh, was published on PC world that was
about, uh, something related, like, uh,
exploring the HDMI cables to detect what
people were seeing on their, uh, monitors.
So I think it's interesting how certain
attacks they, they, uh, Go beyond our, um, more
immediate thoughts about how attacks may happen.
And they're on exploring these, uh,
capabilities of Harvard and also, uh,
the, uh, electromagnetic field around
then, and also these other properties
and capabilities that are there, but,
uh, sometimes we don't pay attention.
Yeah, definitely.
I love this kind of collection of attacks.
I mean, this NC State University report is also,
it reminds me of like, there's a really old
one from DEF CON a number of years ago, where
it listens to the sound your keyboard makes in
order to extract what it is that you're typing,
which I think is like really fascinating.
I guess Volkmar, you know, working on AI
infrastructure, you know, one reaction is
like, this is cool, but like, is it up to date?
Is it a practical attack?
Is it a security surface that
we need to be worried about?
Because, you know, who's going to stand outside
the data center, you know, do this monitoring?
I think there are two questions.
So one is like, you know,
what's the value of model?
Is the model structure in itself valuable?
I don't think that like, This is the one thing
which, you know, everybody understands the math.
I think there's not huge amounts of gain
right now in looking at the model structure.
Uh, I think the, the true
value are the numbers, right?
So if you don't have the weights,
then ultimately that's the
information that turned into a model.
That's the expensive part.
The model structure itself is the cheap part.
I think that There is, in computer science,
over the last, you know, like, I don't
know, five decades, we, we figured out how
to secure, like, computer infrastructure.
So you have the same thing, you know,
extracting databases and database
content, uh, out of computers.
Uh, we have, you know, cryptography now on
pretty much every link in the computer system,
so that you can even run it in the cloud,
and so there's a whole confidential compute.
Um, Uh, like, movement where, you know,
the, the PCIe link from the host to the
GPU is fully encrypted, so that you cannot
intercept anything which travels over it.
Um, and so I think we have very standard, Um,
like defense mechanisms against just, you know,
brute force attacks on the physical hardware.
Um, I think where we are, I think, a little bit
less affair today is how models get exchanged.
And so the model in itself, in particular,
if you look from a proprietary perspective,
contain your business information, right?
So let's say you fine tune it and you take your
proprietary data and you stick it in the model.
You don't want to, like, You wouldn't
give your database to your competitors.
And so you wouldn't want to give
your model, which knows everything
in your database, to your competitor.
And so we are still in a world right
now, I think, where we haven't really
figured out what the end to end process
of confidentiality around models is.
And so there are pieces missing in
the infrastructure, just like how
we evolved to where we are today.
Like, you know, can I, have fully
encrypted models which only get
decrypted, for example, inside of the GPU.
Can I have, uh, so this is the weights,
this may be the code, if I want to make that
confidential, and then it also means like,
you know, the stimulus or the inputs and the
outputs, are they making it, you know, over the
wire in an encrypted or unencrypted fashion?
So we have the mechanisms, the problem
is that they are not pervasively, uh,
deployed or available in hardware or used.
Right. And so I think they're over the next
couple of years, I think we will see much
more, um, like effort being put in to make
sure that, you know, we actually protect
the asset in a much more stringent way.
I mean, we have data at rest, everything's
encrypted, data in flight, everything's
encrypted, and here comes the model, right?
And it's like, Good luck.
Yeah, it's really interesting.
I mean, I mean, if I, if I'm hearing you
right, it feels like a little bit like
as true with many things in security.
It's like we have the techniques.
Now the question is, can we get
like uniform adoption in a way
that actually offers security here?
And I think it's like a whole ecosystem
of software just has been written and
it's kind of this ways to the market.
And now we are like, Oh my God, you
know, like All that proprietary stuff,
you know, we are like wide open.
Uh, and so I think we will
actually start locking things down.
And, you know, like from an enterprise
perspective, what we do at IBM, this is
like, you know, a good chunk of what's next
is, is like protecting your data assets,
protecting your model assets, making sure,
you know, that you are actually compliant.
And I think that whole workflow of building
these things, Um, you know, it's it has
been an afterthought because you know, we
just download the internet and train it in
a model but I think we are now getting into
into a world where um, you know, there are
really Uh expensive assets which you you
must protect and so like everything which
follows in the in the enterprise Uh will
effectively be employed or employed with AI,
I think it's a totally I think it's like,
and it actually ties to what we were
talking about a little bit earlier, right?
If like the trend is we sort of run
out of all the common data, right?
Like there's also just like more proprietary
stuff that becomes a component of how
we go about doing this in a way that's
like, raises the risks here, right?
Actually raises the incentive.
Abraham, maybe I can ask you is like,
you know, so there are these kind of
interesting security questions about you
know, model exfiltration and all that.
And, you know, granite, of course, it's like an open model.
Um, but I'm kind of curious about how you
guys think about security on a release like 3.1.
Um, you know, do, is there a separate
team that does that analysis?
Just kind of interested in hearing a little bit
more about how you guys think about that, uh,
in the context of like an open source launch.
Yeah.
Um, so we think about, so to answer your
question, yeah, there's a, there's a dedicated
team led by Ian Malloy focused on security.
Um, and they do a lot of work in terms of
being able to better under, Identify like
vulnerabilities in the model, um, there's
also a wide swath of safety and red teaming that
we do more so from the safety perspective,
ensuring that our models don't harm.
Um, but one of the big things we've seen
in terms of, you know, security, at least,
you know, from from our cyber security team
is that, um, models are most vulnerable
because or ways we found models or models
are vulnerable is that in the training
data, we may be able to, you know, train on.
Um, For instance, a large corpus of a
language that's not in our intended use case.
So if a user was to, you know, prompt our model
on that particular use case, or that particular
language, given that that's not in the scope
of our security framework, it makes it a little
bit more vulnerable to either jailbreak or to,
you know, get a response that maybe not is in
scope of what the model should be used for.
From the infrastructure standpoint, I'll be.
Totally transparent.
This is kind of out of my purview.
In reading the article, it was just
really interesting to see that, you
know, one, this was models on the edge.
So, things that are, you know, TPUs
are not typically used to inference,
you know, larger scale LLMs.
Also, it was, What I found interesting
was this was a vulnerability specific
to a, uh, infrastructure provider.
It wasn't necessarily an external
attack that was successful.
So I think it just brought up the question
in terms of, is this something where model
infrastructure providers need to provide
more robust, um, you know, guardrails around
how potential bad actors inside the company.
Uh, could potentially, could, you know,
infiltrate their models, um, and then just to,
you know, the point that Balkmar made, um, a
lot of the models, you know, the infrastructure,
the architecture of the model, i.
e., you know, the layers and, and, and other,
uh, aspects of the model, they're usually
provided as part of the open source community.
So a lot of these things are
on hand for people to use.
So in terms of whether this is a risk that,
you know, we need to kind of dive into full
scope, I can't say yes or I can't say no,
but I think my take from it was, I think
it opened up a little bit more of, uh, the
question of how our infrastructure providers
ensuring that bad actors inside the company
aren't, you know, trying to infiltrate any
of the models that are on, that they serve.
That's right.
Yeah. And I think it goes to a little bit of
what Volkmar was talking a little bit
about, which is, um, you know, a number of
these security issues are known security
issues for just the infrastructure, even
before you get to AI infrastructure.
And so, you know, the insider threat problem
is big in any case, and it's kind of just
like, well, how much of this kind of folds
into kind of traditional security work?
I think this is one of the reasons I'm
asking Abraham is I'm really interested
in kind of how Like, an organization
decides to deal with safety and security
on these models is really interesting.
Who's responsible for bits of this?
And I think it's assigned in very
different parts of the organization.
It's like, very interesting.
And it's new.
These are things that, you know, we,
like, as much as AI has proliferated
over the last couple of years, it's only,
you know, attention is all you need.
It only came out, you know, seven years ago.
Actually, we came out
three and a half years ago.
So, we're still trying to
figure out a lot of things.
That's right.
I mean, one of the things I love to say
on MOE is, in the old days, by that I
mean like five years ago, six months ago.
Yeah, that's right.
So I'm going to move us on to our final
topic for the day, which is Jetson.
Um, background on all this is that
NVIDIA announced a small, dare I
say, cute little board supercomputer
called Jetson, uh, for AI developers.
Um, unlike the, uh, uh, eye watering price of
an H100 or what will be the price of the GB200.
Um, this retails for a relatively
cheap price of 250 bucks.
Um, and it's this kind of handheld little board.
Um, and I guess Volkmar,
maybe I'll turn it to you.
It's like, you know, Why is NVIDIA
getting into this business, like
this kind of hobbyist GPU business?
Um, and do you think it actually kind of like
matters for the overall AI market at all?
Or is this kind of a little bit just
like, Jensen wanted to do a fun thing and
is releasing like a little board, just
because it's a fun end of year thing to do?
Yeah, so this is a, is a continuation of
something NVIDIA invested for many, many years.
Um, and so before the ChatGPT craze.
Um, like there was a massive investment
of NVIDIA into autonomous vehicles.
Um, and so coming from that industry, you
know, we've been tracking that board pretty
much, I mean, it came out, I think, 2015,
16 ballpark ish, um, and has been like
continuously receiving updates from NVIDIA.
So this is really, um, a board which is You
know, for low power robotics, it's there is
a version which has lots and lots of cameras
being able to be attached so that you can
actually put it on a, on a robot, and it's, um,
it's cheap because, you know, it's made for.
for scale.
It's not, you know, running large language
models, more like vision processing, you
know, planning, uh, like, uh, mapping.
So it has a bunch of on, on processes
on it, and it has a bunch of,
uh, has a effect in NVIDIA GPU.
So I think the, the, the main benefit over All
the other solutions on the market is that in
many cases today you'll train your models on
an nvidia card So you're living in an ecosystem
and then you can just move that ecosystem in
production where you only run the inferencing
part And it's powerful enough that you can
actually you know, you do camera processing
the video encoders, etc on on these On that
one chip, uh, it's a pretty low power solution.
So you don't drain batteries.
So you don't want to put an H100 on a robot
simply because, you know, your robot drives
like five meters and it's out of power.
Um, and so it's, it's more of an
embedded system on a chip thing.
So I think the nice part of it is it's, you
know, now it used to be like around 500 bucks.
Now it's down to, uh, you know.
250 or so.
It's, it's really affordable for, you
know, hobbyists, but also like if you
want to build something at scale and you
get a chip, uh, you know, for 250 bucks,
you effectively can build a robot now.
So like, at least from the electronics
part, I mean, the robot's missing, right?
Yeah. Right.
You at least got the board.
The robot is up to you.
So the, the, the automotive part of that
is a bit more involved and it has like
dual chips and has many more cameras and it
has, you know, uh, ADA is ADA's compliant.
And that's one, you know, it's
kind of the baby chip of that.
Vagner, maybe I'll turn to you on this is,
you know, we've talked a lot about, you
know, models getting smaller, but when we've
talked about kind of like system on a chip
and we've talked about kind of like, you
know, AI at the edge, it's often been in
the context of like a mobile phone, right?
Like we've talked about like
Apple doing a new release.
One of the things I think that's like
pretty fun about this is that it's kind
of like it's offered as just like, or it's
marketed certainly as kind of like a tool for
hobbyists, you know, student groups that want
to build robots or they want to do their own
inference, like for their own experiments.
Um, that ecosystem I think
is really interesting.
And I guess I got kind of question
for you is like, how far you think
that's going to go over time?
Um, you know, maybe the last thing I'll
throw in as I've been watching this
project by George Hotz for some time,
this tiny corp project, where he wants
to basically offer everybody a petaflop.
Um, and I think one of the questions is just
like, does it become cheap enough that everybody
ends up having like a little GPU rig at home?
Um, you know, I'm kind of curious about how
you think about like that, because it starts to
look quite different from how we do AI nowadays.
Yeah, this, uh, accessibility in terms of
value, I think that it's the, the, More, uh,
the most important aspect in my understanding,
even that if we think about, uh, developing
countries, for instance, in Brazil, uh, 250
is like the minimum wage, a monthly salary,
so it's not that cheap in developing some
developing countries, but even though if we
consider that this is, uh, The cheapest right
now, I think that opens possibilities and
a lot of possibilities in those countries
where people have a lot of creativity.
And, and, uh, the, the, this kind of,
of harder, um, may allow them to think
about, uh, like robots for architecture,
uh, agriculture or, Other, uh, interesting
uses that, uh, the cost before this, this
specific, uh, hardware, uh, was a, a blocker.
Yeah, that's right.
Yeah, the kind of continual
democratization is interesting.
Yeah, definitely for international as well.
I mean, I think that's a really big
component of this is who gets to be
able to tinker with some of these tools,
um, seems really, really important.
I guess, Abraham, do you, I don't know,
are you an AI hobbyist in your free time?
I'm kind of curious if you would
like buy something like this, play
around with these types of tools.
Uh, I mean, it depends on
who's asking the question.
Uh, yeah, I definitely, in the old
world, I definitely played a lot with it.
Um, I'm not as, you know, technically
sound as some of the other team
members here at IBM Research.
But I find myself playing with
the agentic framework quite a bit.
Um, when Baby AGI came out a couple years ago,
I found myself really kind of diving into it.
So, I don't know if I'd buy this per se, but,
um, I think just like the AI development is
being abstracted to a level where it's a lot
of plug and play nowadays, so you get a lot
of, you know, the skill set of a developer,
although for like core LLM development may
still be, like, you know, sound, uh, but in
terms of just a hobbyist, I think that skill
set, you know, can be a little bit lower,
um, and I, I think this is kind of a pretty
cool tool to, to, to help that You know, side
of the fence in terms of, you know, creating
more models that you can run on the edge.
Yeah, absolutely.
I think there's going to be
just a ton of activity there.
Well, great.
Well, thanks everybody for their time today.
Uh, Abraham, welcome to the show.
Uh, hoping to have you on for a future episode.
Uh, Vagner Volkmar, always a pleasure
to have you on the show as well.
Thanks for joining us.
If you enjoyed what you heard, you
can get us on Apple podcasts, Spotify
and podcast platforms everywhere.
Uh, and listeners, we'll see you next week.