Learning Library

← Back to Library

Peak Pre‑Training and Synthetic Data

40m • Unknown Channel • ai-ml • news • intermediate • Watch on YouTube ↗

Key Points

Ilya Sutskever’s keynote at NeurIPS proclaimed that we have hit “peak pre‑training,” suggesting future AI advances will require alternatives beyond larger pre‑trained models.
Vagner Santana warned that synthetic, AI‑generated data is already flooding the web and, without reliable detection tools, we may unknowingly be training new models on content that itself was produced by LLMs.
Volkmar Uhlig cautioned that it may still take a few years before the industry fully transitions away from heavy reliance on pre‑training, despite growing interest in other techniques.
Abraham Daniels, a Mixture‑of‑Experts (MoE) specialist, noted that while MoE may become less central over time, it remains an important piece of the evolving AI toolbox.
The episode also previewed upcoming topics such as Granite’s latest release, novel model‑theft attacks, and NVIDIA’s ultra‑compact supercomputer, framing them within the broader MoE discussion.

Sections

Full Transcript

# Peak Pre‑Training and Synthetic Data **Source:** [https://www.youtube.com/watch?v=GnMKY4QLHDw](https://www.youtube.com/watch?v=GnMKY4QLHDw) **Duration:** 00:40:27 ## Summary - Ilya Sutskever’s keynote at NeurIPS proclaimed that we have hit “peak pre‑training,” suggesting future AI advances will require alternatives beyond larger pre‑trained models. - Vagner Santana warned that synthetic, AI‑generated data is already flooding the web and, without reliable detection tools, we may unknowingly be training new models on content that itself was produced by LLMs. - Volkmar Uhlig cautioned that it may still take a few years before the industry fully transitions away from heavy reliance on pre‑training, despite growing interest in other techniques. - Abraham Daniels, a Mixture‑of‑Experts (MoE) specialist, noted that while MoE may become less central over time, it remains an important piece of the evolving AI toolbox. - The episode also previewed upcoming topics such as Granite’s latest release, novel model‑theft attacks, and NVIDIA’s ultra‑compact supercomputer, framing them within the broader MoE discussion. ## Sections - [00:00:00](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=0s) **Debating the End of Pre‑Training** - Panelists discuss whether AI pre‑training has peaked, covering synthetic data detection, Mixture‑of‑Experts trends, and upcoming NeurIPS insights. - [00:03:06](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=186s) **Shifting From Pre‑Training to Test‑Time Compute** - The speaker describes how their firm leverages partner‑sourced proprietary domain data and is transitioning toward inference‑time (test‑time) computation, reducing reliance on static, large‑scale pre‑training. - [00:06:11](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=371s) **Filtering Training Data with LLMs** - The speaker argues that massive, noisy internet data must be vetted using large language models and test‑time selection mechanisms to separate truth from garbage during pre‑training. - [00:09:24](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=564s) **Feedback Loop of Synthetic Data Bias** - The speakers discuss how reusing LLM‑generated data for pre‑training can perpetuate existing biases, highlighting the difficulty of assessing data quality and the lack of reliable methods to detect synthetic origins. - [00:12:31](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=751s) **Granite Guardian & Embedding Model Release** - The speaker details the launch of Granite Guardian 3.1 for hallucination detection, new multilingual embedding models for semantic search, their availability on Hugging Face, Watsonx and partner platforms, and previews future MOE scaling and multimodal capabilities. - [00:15:40](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=940s) **Balancing Openness and Safety** - The speaker argues that open‑source AI models can simultaneously provide transparency and security, citing community‑driven bug‑fixes and guardrails like Granite Guardian and Llamaguard as evidence. - [00:18:47](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=1127s) **Model Prompt Tuning Creates Vendor Lock‑In** - The speaker explains that extensive prompt engineering is specific to a given model family and cannot be transferred to others, creating strong lock‑in to those models while compute resources remain easily switchable across cloud providers. - [00:21:56](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=1316s) **Future Prompt Optimization & Model Exfiltration Attack** - The speakers debate whether advances will render prompting obsolete as models self‑optimize, then discuss a recent side‑channel attack that extracts AI models by monitoring TPU hardware activity. - [00:25:07](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=1507s) **Assessing Practicality of AI Side‑Channel Attacks** - The speakers debate the real-world threat of side‑channel techniques—like acoustic keyboard eavesdropping—to AI infrastructure, concluding that the valuable asset is the model’s weights rather than its architecture, and that established security measures (e.g., cryptography) largely mitigate such risks. - [00:28:15](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=1695s) **Securing Data and Model Assets** - The speaker discusses the need for comprehensive encryption, uniform adoption, and compliance to protect both data and AI model assets within enterprise ecosystems. - [00:31:18](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=1878s) **Internal Threats to Model Infrastructure** - The speaker discusses a recent article revealing a vulnerability in edge‑inference TPU deployments, emphasizing the need for infrastructure providers to implement stronger guardrails against insider attacks on open‑source LLM models. - [00:34:26](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=2066s) **NVIDIA Low‑Power Robotics Board Overview** - The speaker outlines NVIDIA’s long‑standing autonomous‑vehicle investment and details a circa‑2015/16 low‑power robotics board optimized for vision, mapping, and inference that lets developers train models on NVIDIA GPUs and seamlessly deploy them for on‑device processing without draining robot batteries. - [00:37:29](https://www.youtube.com/watch?v=GnMKY4QLHDw&t=2249s) **Making Petaflop Computing Accessible** - The speaker highlights how George Hotz's low‑cost petaflop hardware could democratize AI by dropping price barriers, enabling innovators in both wealthy and developing regions to experiment with advanced applications such as robotics, architecture, and agriculture. ## Full Transcript

0:00Are we at peak pre training Vagner Santana 0:02is a staff research scientist and master 0:04inventor on the responsible tech team Vagner. 0:06Welcome back to the show. 0:07What say you 0:08given that we have, or we don't have 0:11methods for detecting synthetic data, 0:13maybe the biggest read in the past, right? 0:15Because Now people are realizing, 0:18and maybe we've done that already. 0:21Volkmar Uhlig is Vice President 0:23AI Infrastructure Portfolio Lead. 0:25Uh, Volkmar, how about you? 0:26I think we need to give it a couple more years. 0:28And Abraham Daniels joining us for 0:29the very first time is a Senior 0:31Technical Product Manager on Granite. 0:32Abraham, welcome to the show. 0:34You're now an expert on Mixture of Experts. 0:36Uh, tell us what you think. 0:37I wouldn't say that it's over. 0:39I guess I'm 100 percent sure that it's over, but 0:41I think we're less reliant on it going forward. 0:44All right. 0:44Awesome. 0:44All that and more on today's Mixture of Experts. 0:52I'm Tim Huang and welcome to Mixture of Experts. 0:54Each week, MOE is dedicated to bringing 0:56the breaking news and analysis you 0:58need to understand what's going on in 1:00the world of artificial intelligence. 1:02Today is another jam packed episode. 1:04We're going to talk about the latest release 1:05out of Granite, weird ways of stealing 1:08models and NVIDIA's tiny supercomputer. 1:10But first, let's talk a 1:11little bit about pre training. 1:13As many of you may know, uh, this 1:15is the kind of week of the big 1:17machine learning conference, NeurIPS. 1:19Um, Ilya Sutskever, a big, actually 1:22prominent thinker and kind of intellectual 1:23and entrepreneur in the AI space, uh, 1:26gave a keynote talk in which he claimed 1:28that we are at peak pre training. 1:29pre training. 1:30Essentially pre training is over and to create 1:32improvements on AI going forwards, we're going 1:35to have to employ a bunch of different methods. 1:39Um, and I guess maybe Vagner , I'll start with 1:41you because I think you, you nodded towards 1:43kind of one solution that Ilya mentioned 1:46in his talk, which is synthetic data. 1:49Um, are you ultimately really optimistic? 1:50I mean, it's kind of sounds like you've got. 1:51kind of almost a paranoid view that we might 1:53already be living in synthetic data land. 1:55But, uh, tell us more about why synthetic 1:57data might be a way forwards if we think 1:59that we're literally running out of the 2:00data that we need to do pretraining. 2:02I, in fact, I, I'm concerned 2:04about it because, uh, when we see, 2:07um, like the projections of how. 2:10Synthetic data is populating the web, uh, we 2:14are, and we don't have methods for actually 2:17detecting and doing pertaining saying, okay, 2:19this is synthetic, this is not synthetic. 2:21So, um, that is why I answer, uh, the, the, the 2:26way, because we, um, may already been, uh, pre 2:30training models with content generated by LLMs. 2:34Because we don't have a way to properly 2:36filter when, uh, or good methods for 2:40properly filtering when, uh, something 2:42was generated by an LLM or not. 2:44So that's, that's my concern. 2:46So we, we may be living already, uh, in the 2:49world that, uh, Uh, pre training is happening 2:51or he's using data generated by LLMs. 2:54Abraham, maybe I'll try to turn it to you. 2:56Cause I know you work on the granite team. 2:58Um, I think one really interesting kind 3:00of aspect of all this is that a lot of the 3:02data that we've relied on for pre training 3:04in the past has been open data, right? 3:06People use common crawl and what have you 3:08to kind of like pre train their models. 3:10And I guess part of the idea here is 3:12that like in this world where basically 3:14all the data is available, um, there's. 3:17And being used basically proprietary data 3:20becomes a lot more valuable and I'm curious 3:22about how you think about that at Granite 3:23I mean, I know, you know IBM's taken a 3:25real strand stand towards openness And I'm 3:27wondering if that also applies to how you 3:29guys think about the data sets that are 3:31gonna become more and more important here 3:33Yeah. So great question. 3:35Um, in terms of data, I think we've done 3:37a lot of work in terms of partnering 3:39with third parties to be able to backfill 3:41some of the more specific like domain or 3:43enterprise data that's key to our not only 3:45our models, but our commercial road map. 3:50So I think that's going to be like a really 3:51big pillar for us is, you know, where can we 3:53find data that may not necessarily be open 3:55source, but that is kind of central to our 3:58road map, focusing on domain specific models. 4:02But kind of back to what I said 4:03earlier in terms of where pre training 4:04might not be over, but I think we're. 4:06There's a shifting paradigm in terms of 4:08how we inference our models, um, more 4:11specifically what's called test time compute. 4:13And you're starting to see this with some of 4:14the newer models that came out, um, chat GPT 4:17o1 as well as Quinn where, um, you know, we're 4:21less reliant on the static knowledge that we 4:23get as part of our, you know, pre training 4:26and really focusing on, you know, how do 4:28we make models more capable at inferencing? 4:31Having them think a little bit more about 4:33their answers as opposed to using a system 4:35one thinking where, you know, the first 4:37answer they get is what they respond with. 4:39Yeah. And I think that's a really interesting approach 4:41is almost kind of like the, the real action 4:43now is almost, data lists in some ways, right? 4:45It assumes the pre training happens, and 4:47then all the real optimization is going to 4:49be like doing these kind of inference tricks, 4:51which is going to be a very, very different 4:52way of thinking about some of this stuff. 4:54Yeah, it's a lot more system two thinking. 4:56So reflecting on your answer, having the 4:58ability to go back, change your answer, or, 5:01uh, you know, better understand if there 5:02was a misstep in your thought process. 5:04Volkmar, maybe I'll bring 5:05you into this discussion. 5:06We've had you on a couple 5:07of shows at this point. 5:08I think I'm starting to get the Volkmar 5:10Vibe on how you answer questions. 5:12I don't know if I'm reading too 5:14much into what you said, where you 5:15said, Well, we got a few more years. 5:16Do you think this is just hype? 5:18This is just Ilya doing his thought leading? 5:19Like, we're not really at peak pre training. 5:22You know, this is like, it's, he's 5:23describing a trend, but it's probably 5:26more hyped than anything else. 5:28So I think what he's touching on is, 5:30and this has been a trend over the last 5:32couple of years, that, um, the amount 5:35of information you can get in the open, 5:38um, in fact we downloaded everything and 5:40we trained everything into the models. 5:42And so we are now at a point where, you 5:44know, uh, and I'm going to mention that, you 5:47know, like what, how do you differentiate 5:49data which is computer generated versus 5:52what is actually human generated? 5:54And that's the fundamental question, and so. 5:57First of all, we assume that human generated 5:59data is good and machine generated data is 6:01maybe good, and I think that is not true, right? 6:04So human generated data is misleading and 6:07is wrong and we're gonna go if you just 6:09crawl the internet What do you know, right? 6:11You just download some stuff and people make 6:13stuff up and you train that into the model 6:15and you declare that's reality So I think we 6:18are at a point Abraham touched on this, and 6:21I'm a strong believer in the system one system 6:23two thing, um, where we need to actually 6:26test the data which we download from the 6:29internet, or we just synthetically generate it. 6:32And that's kind of like test time 6:33compute, uh, behind it, right? 6:35So I spit out a bunch of answers, and 6:37some of them are wrong, some of them 6:38are right, and I pick the right one. 6:40So I can apply the same mechanism to 6:42stuff I download from the internet. 6:43And now it doesn't matter anymore. 6:45If it's, you know, human generated and we kind 6:48of assumed humans, you know, generate good data. 6:50And so anything we could download was good. 6:53I think we are now in a world where we can 6:54produce so much bad data at a really high 6:58velocity, uh, that we actually, when we are 7:00creating those pre training data corpora, 7:03uh, we need to actually go through the data 7:05sifted and actually like classified as, 7:08you know, this is garbage and this is true. 7:10And I think the only way to do that is actually 7:12using large language models and the same 7:14thing we are using for test time compute where 7:16we are, you know, picking the right answer. 7:18We will probably have to apply 7:21to the data corpus we train in. 7:23I think still the pre training, you know, 7:25is the system too where, you know, I don't 7:28think for a long time, but I just Recall, and 7:31I think in any case, we will have to build 7:33that core model and the model architectures 7:37change and how we are training changes. 7:39And so I don't think pre training per 7:41se goes away, but I think that the focus 7:43will shift the pre training as the basis 7:46so that you know something and then you 7:47can apply it in the system one type. 7:50You know, where you're 7:51spending more time thinking. 7:53Yeah, just like it gets you 7:54to the first base, basically. 7:56But like going further is going to 7:57depend on all these other techniques. 7:58Correct. I think 7:59the further is the hard one. 8:00And this is where I think 8:01reinforcement comes in. 8:02You know, we are actually 8:03discovering new knowledge, and we 8:05will publish that new knowledge. 8:06And I think new knowledge will not 8:07necessarily come from humans all the 8:09time, but it will come from models. 8:11Yeah, I love this observation that like, 8:13You know, in some ways, the gains from 8:15just having more data have been so strong 8:17for so long that like the incentive 8:19has just been like, dump more data. 8:20We really don't care. 8:21We don't really look too closely because like 8:22the more data we put into this magic machine, 8:24it just gets better and better and better. 8:26I guess kind of you're saying is 8:27that the meta is kind of shifting now 8:29that we are concerned about quality. 8:31And then I think the nuance you're adding 8:32to it was very interesting is that like that 8:34synthetic data may very well be much more 8:35high quality than what we get out of humans, 8:37which I think is an interesting outcome. 8:40Um, I guess, Vagner, how 8:41do you think about that? 8:42Because I think in some ways, like from 8:43the point of view of like a responsible 8:45AI or responsible tech ethics person, You 8:47know, I think that the, the bias has always 8:49been towards human generated content. 8:51We say, oh, well, we don't trust the 8:52synthetic generated content because 8:54who knows it might embed all this bias 8:56and it might have all these problems. 8:58Do you kind of buy what Volkmar is saying? 8:59Like that actually, like we are now rapidly 9:01approaching this point where Hey, our kind of 9:03long standing, you know, kind of like prejudice 9:08in favor of human data is actually misplaced. 9:10My concern is is that, uh, if we think 9:13about the most popular LLM used now 9:17or for, uh, or by people to generate 9:19content, could be OpenAI's ChatGPT, right? 9:24And, uh, if we have this data And I, I, I 9:30agree with Volkmar in terms of the quality. 9:32It's, it's difficult, it's difficult 9:34to assess quality of data, human 9:36generated or synthetic, uh, data. 9:38Um, but imagine that we have a lot of 9:42people using Chat2PT to create data and then 9:45this data is used again, uh, to pre train. 9:48CHPT version plus one, right? 9:51So now the biases that he had the data had 9:54before are coming in again to the model. 9:57So my concern is that this, uh, auto, uh, uh, 10:02like this feedback of bias, considering that, 10:05uh, uh, We don't have yet good, uh, methods for 10:09detecting when the data was generated by an LLM. 10:12I think that that's my concern, how to prevent 10:14that the bias that a previous version has 10:17to come in to the next version of, of, uh, 10:20any LLM, um, Pre training on on these data. 10:24Yeah, for sure. 10:24I mean, I think this goes to the question, 10:25which is like, I guess quality is a little 10:27bit under theorized in this context, right? 10:28Like, what do we actually mean by better 10:30quality here and against what types of tasks? 10:32I guess, Vagner, you've got certain 10:33concerns, right, around this data. 10:36You know, I guess if the main use 10:37case is like, I don't know, math 10:38or something like that, right? 10:39Like, we may actually say that the 10:40synthetic data is actually better, but it 10:42really depends on use case in some ways. 10:48I'm going to move us to, uh, our 10:50next topic, I think, uh, we had a big 10:52launch this week, which is Granite 3.1, uh, Abraham, you're on the show 10:56in part because you're from the team. 10:58Uh, do you want to kind of, uh, tell 11:00our listeners, I guess, what's, uh, 11:01what's coming out and what people 11:02should be paying attention to? 11:03Yeah, 11:04actually, what's already come out. 11:06So as of yesterday, we released Granite 3.1, our latest model in the series, in 11:11the Granite series, family of models. 11:13So it's built on top of Granite 3.0. 11:16And as part of that release, we've 11:19pushed out our Granite 8b Instruct 11:21as well as our Granite 2b Instruct. 11:23Our Granite 8b is really kind of our 11:24workhorse model for, you know, 80 percent 11:2890 percent of use cases enterprise as 11:31well as any sort of specific domain cases. 11:35What we're really excited about in terms of 11:36our workforce model is that we know we've 11:37seen great improvements in instruction 11:40following as well as multi step reasoning. 11:43Um, along with our granite eight being to be 11:45dense models, we've released are a suite of M.O.E 11:48Or mixture of expert models. 11:50Um, these come in one B or one 11:52billion as well as three billion. 11:54And these are really focused on resource 11:55constraint environments, any sort of low latency 11:58applications, edge computing, which we'll talk 12:00a little bit about in terms of the NVIDIA. 12:04Um, And, and the big kind of 12:06release or the capability that we're 12:07launching with the Growth Granite 3.1 is, uh, we now support, 12:11uh, 128 context length. 12:13Um, so what that really means is we 12:15can, you know, uh, input, uh, larger, 12:18uh, you know, tokens into the model. 12:21So what that supports is, you know, 12:22long, uh, documents or multiple, multiple 12:25documents to support QA, um, as well as 12:28any sort of, you know, uh, code bases. 12:31So you can float in a full. 12:33Code base repository. 12:35Um, and it also lends to more, 12:36uh, LLM powered autonomous agents. 12:39Um, along with our language models, we've 12:41released our granite guardian series. 12:43So these are our guardian models, um, 12:46that support detection across a number of 12:49different, uh, biases and, uh, hallucinations. 12:54So specifically, the, the most 12:56recent Granite Guardian 3.1 supports function calling hallucinations. 12:59So again, a great feature for agentic workflows. 13:02Um, and then lastly, as part of our release, we 13:04have pushed out our Granite embedding models. 13:06So these are efficient, you know, robust 13:09models that support semantic search. 13:11Um, they've come in four sizes, Uh, is 13:14cross English and language and in terms 13:16of language, we support all 12 languages 13:19included as part of our language model. 13:21So we're super excited to 13:22have them out in the market. 13:23Um, they can be found on Hugging Face 13:25as well as our watsonx platform. 13:27Um, we also have them, uh, on, uh, a 13:29number of different partner platforms. 13:30So Olama, Replicate, uh, they'll 13:33be pushed to NVIDIA as well. 13:34Uh, and we're just really 13:35excited for, for what's to come. 13:37And we're looking forward to, you know, 13:38scaling out our MOE models in 2025, as well 13:42as introducing some multimodal capabilities. 13:44Yeah, that's awesome. 13:45So a lot there, obviously, to go through. 13:48I mean, I think maybe the first thing 13:49I'll kind of bring you back to talk 13:50a little bit about is context window. 13:52I know that was a big part of the launch, 13:54a big part that like, you know, kind of IBM 13:56is I think touting as part of this release. 13:58Yeah. 13:59Can you paint a little bit more of a picture 14:01of kind of like what this means for, you know, 14:02again, like enterprise customers, like what does 14:04a long context window actually mean in practice? 14:07So, uh, it's basically how many words you 14:09can input as part of your model inferencing. 14:12So our initial models were 4k, and there's 14:14really no one to one in terms of, you know, 14:16tokens to words, but we'd say about 1.5. 14:19If you will, so let's, uh, 128k 14:22context, uh, length would be about 14:25300 pages, you know, give or take. 14:27And what that really means is part of 14:28that is now you can ingest, you know, 14:30multiple documents, um, that supports any 14:33sort of particular QA or legal documents, 14:36anything that spans multiple pages or 14:38multiple, you know, corpuses of information. 14:41Um, and it opens up a couple 14:42different capabilities for users. 14:45Um, again, specifically, more specifically, 14:47LLM support, uh, sorry, uh, agent support, 14:50which is, uh, you know, uh, prominent, kind 14:54of like the buzzword right now is agents. 14:56Um, otherwise, it also supports, um, you know, 15:00new possibilities around repository level code 15:04understanding, um, as well as self reflection. 15:06So we kind of talked about it a little bit 15:08where you can now start to ask your models 15:11to reflect on the input or the output and 15:15start to have that little bit of system 15:16two thinking where it can start to, you 15:18know, better understand its answers and 15:20potentially shift its answers if necessary. 15:23Right, for sure. 15:24Vagner, maybe one thing I can bring you to 15:25talk a little bit about here is, I think 15:28the models that are focused specifically 15:30on safety are pretty interesting here. 15:32Um, I think for a very long time, I think 15:34one of the concerns, just to put it out there 15:36around open models, is well, they're going to 15:38be used for all sorts of bad purposes, right? 15:40And I think one of the really interesting 15:41questions has been, can you achieve openness 15:43and all the things that we like out of openness? 15:45While still ensuring kind of like 15:46safety in the model ecosystem. 15:48And I take it that these models, 15:50uh, that are safety focused are 15:52kind of like an attempt to do so. 15:54I guess my question for you is like, do 15:55you think that we're, we're on track, 15:57like that eventually we will be able to 15:58have our cake and eat it too, to get the 16:00openness and the safety at the same time? 16:03I think so. 16:03And if we compare even the topics that 16:06we discussed in the last episodes, always 16:08when we talk about certain, uh, attacks 16:11that we discovered, they are mostly 16:13connected to, um, proprietary models, right? 16:16Because when we see that happening 16:19for open models, People contribute 16:21and people try to fix it. 16:23We have a, a, a a community around these assets. 16:26So I think that, uh, in that sense, 16:28I think that the open source strategy 16:30makes a lot of sense in my opinion. 16:33Uh, and, and it, it's a interesting way 16:35and, and the way that, uh, for instance, 16:37for granted the granite guardian is, uh, 16:39uh, structured like, uh, uh, also from the 16:42very fir as a first, uh, barrier in terms of 16:45what is being sent to the model, uh, right. 16:48Uh, working on the prompt. 16:49Trump level and also after generation, I 16:51think that that's a good strategy also. 16:52And we see that also in other open platforms 16:55like, like in Llamaguard, we, they have like, 16:58uh, uh, um, an open source model to detect 17:01also these, these types of, of harms, right? 17:05And I think that, um, again, when we see 17:08new ways of attacking as the ones that we 17:11discussed a few episodes ago, they are because 17:14we don't know a lot about, uh, architecture, 17:18about the code, about the, the, the, um, 17:21flow of information and generation and prompt 17:23how prompt, uh, is, uh, uh, prompts are 17:27going in and, uh, the outcome is coming out. 17:29So that, that's, uh, I think that, again, 17:31Open source is a good approach to tackle this. 17:34That's right. 17:35Yeah. So final element I want to kind of touch on 17:36before we move to the next topic is I think, you 17:39know, Volkmar, you work on AI infrastructure. 17:41Um, and I think one of the observations we 17:43talked about a few episodes ago about all 17:44the announcements Amazon was making was 17:47there's sort of one interpretation that in 17:48the future, you know, like infrastructure 17:50wins because kind of like the models become 17:53more, more commodity over time, right? 17:55Like someone could say, Oh, for a few 17:56months, I want to try out the IBM model. 17:58I'm now going to try out the Llama model. 18:00I'm going to go back to the IBM model. 18:01Like we're living in a world where it 18:03seems like increasingly the models will 18:05be things that we kind of like switch in, 18:06switch out, you know, kind of at will. 18:08I don't know if you agree with that's kind 18:09of like how the future will look or if it 18:11really will be, you know, in the future, a 18:12customer will say, Oh, we're going to build 18:14entirely on, you know, the IBM model stack. 18:18And that will be kind of the future. 18:19It's kind of a question I guess for 18:20you about like how much software 18:22will become like the key platform. 18:24Or if it really will be actually more 18:25kind of like an infrastructure thing 18:26where like we build on AWS and sort of 18:29the models are very interchangeable. 18:30So I think that from just from 18:33experience of, you know, switching from 18:34one, one, one model family to another 18:37model family, uh, it's very hard. 18:40So I think there is actually a substantial 18:42lock in into a specific model and it's 18:45primarily because of prompt tuning. 18:47And, uh, we've seen just by doing better 18:50prompt tuning for a specific model family. 18:53Um, you know like a 30 40 improvement in 18:56accuracy sometimes like 60 Um, and then when you 19:01did that and you spent, you know, three months 19:03of your engineering team You know getting your 19:05prompts correct and you switch to a different 19:07model family Um all that Prompt tuning work 19:11actually is not transferable and so I think 19:14there is a it's almost like you know You you 19:16are betting on a particular programming language 19:19and then you're saying oh, you know programming 19:21language so programming language So I can switch 19:23from Java to Python to C sharp and it's just you 19:26know You just need to revite it a little bit. 19:28I think there is actually a quite 19:30substantial lock in into the models. 19:32And so I, I think what will happen is 19:35that the location where the computation 19:37happens, that's totally commoditized. 19:39So if you're on Amazon, you run on Amazon. 19:41And if you want to run it in, you know, Google 19:43Cloud or IBM Cloud, then that's what you do. 19:46Um, I think that the model families 19:48have a much higher stickiness. 19:51It was like, you know, the way, you know, 19:52the skills and knowledge get trained 19:53and, uh, it's not directly transferable. 19:56Yeah, I don't know if I 19:56necessarily agree with that. 19:58I think, I think the past 19:59you're 100 percent correct. 20:01There's been kind of like a moat around 20:03specific models given the prompt template, 20:06but I think with things like, you know, the 20:09model context protocol that Anthropic released 20:12that again supports Anthropic but is bound to 20:15be an open source tool and kind of what we're 20:17being like what we're building or what, you 20:19know, the communities building across agents 20:21where you have to be able to rely on multiple 20:23models given inference costs and capabilities, 20:26you know, within your particular workflow. 20:28I think right now you're, I think it's 20:30accurate, but I think going forward model 20:33developers are going to be handcuffing 20:35themselves if they try to build. 20:38You know, an infrastructure around only using 20:42their models to support use cases where it's 20:45going to be more so what is the plumbing 20:47going to look like in order to be able to 20:49have interconnectedness with LLMs, given the 20:51particular use case or agent, but I mean, 20:54that's, that's kind of, that's just my opinion. 20:55Do you think that this isn't, it's 20:57a, it's a counter reaction to. 21:00Um to model vendor lock in because 21:03that's usually what you see, right? 21:04It's like, oh we have a cloud here 21:06and we build an abstraction layer 21:08Yeah, no, that's fair. 21:09I just I don't think I feel like model 21:11development became a trying to lock in and 21:15then this democratization or this race to the 21:17bottom in terms of uh model development has 21:22It almost hinders, you know, model developers 21:25if they don't allow their specific models 21:29or their ecosystem to play well with others. 21:32I think that's where you're seeing the 21:32proliferation of open source models. 21:35Um, it's, it's driving a community 21:38given that the community decides, you 21:40know, what is good and what is bad. 21:41And from a commercial standpoint, it's, 21:43it's more so how do we make money? 21:45But I don't necessarily, again, my personal 21:47views, I don't think going forward, model 21:49developers are going to, um, They're 21:52gonna be able to lock people in and 21:54still have the adoption that they want. 21:56Yeah. I think, I don't know. 21:57It's gonna be really interesting 21:58to see how it plays out. 21:59I know a lot of like engineer kind of types 22:02that I know are like, oh, well in the future, 22:04like there's not gonna be much prompting, 22:06you know, we'll just run an optimizer 22:07and the prompt will be perfect for it. 22:09Yeah. And it won't really matter. 22:10But, and, and I don't know, I also 22:11know people who work, you know, in 22:13this stuff who's just like, it's very 22:14hard to imagine that we're gonna get. 22:16That level of optimization where 22:18effectively the models become commodity 22:20because you can always optimize I'm 22:28gonna move us on to our next topic So really 22:31sort of interesting news came out this week on 22:34a new type of model exfiltration type attack 22:38And I think Vagner you flagged this for us. 22:40It was a super fun story because normally 22:42we talk about, you know Prompt hacking 22:44and how do we do stuff from just like 22:46the inputs and outputs of the model? 22:47You But this is great because I think this 22:49is the type of attack that you see from time 22:50to time, which is a side channel, right? 22:52Which is we're going to just monitor 22:54a TPU chip, uh, as it does its thing. 22:57And then from that, we're going to extract 22:59all the intel we need to reproduce your model. 23:01Um, so pretty interesting. 23:03And I guess, Vagner, I mean, you 23:04know, you're the one who flagged it. 23:06Like what, what did you find most 23:07interesting about this story? 23:08When I, uh, read the story, I I thought 23:11that the interesting aspect, and I started 23:13thinking about the money that, uh, and all 23:17the resources that, uh, take to train a 23:19model and to deploy to a TPU, for instance. 23:23And then, uh, uh, these researchers, 23:25they're, they, uh, found a way to, 23:28uh, use electromagnetic field to do 23:31a reverse engineering of the layers. 23:34Of the model that are deployed to that 23:37and, uh, they do that by comparing 23:39with a, uh, a data set they have of 23:42over 5, 000, uh, layer architectures. 23:45And so by trying like, like layer by 23:48layer, they can, uh, mimic one model 23:52to a different TPU, right, for, uh, 23:56with, uh, 99 percent of accuracy. 23:59So that, that was, that caught my attention 24:01and say, whoa, that's something that tells. 24:04a lot about how certain, um, 24:06strategies, uh, may be at risk in 24:09terms of these, this kind of attack. 24:12Um, and again, I'm advocating open source 24:15again because that if you have open source 24:18and then that will be a way to, to be less, 24:21uh, susceptible to this kind of attack. 24:23But, uh, yeah, that, that's what caught my 24:25attention in terms of how And last month, 24:29I think I saw also, uh, uh, an attack that, 24:32uh, was published on PC world that was 24:34about, uh, something related, like, uh, 24:37exploring the HDMI cables to detect what 24:40people were seeing on their, uh, monitors. 24:42So I think it's interesting how certain 24:44attacks they, they, uh, Go beyond our, um, more 24:49immediate thoughts about how attacks may happen. 24:53And they're on exploring these, uh, 24:55capabilities of Harvard and also, uh, 24:58the, uh, electromagnetic field around 25:00then, and also these other properties 25:03and capabilities that are there, but, 25:05uh, sometimes we don't pay attention. 25:07Yeah, definitely. 25:07I love this kind of collection of attacks. 25:09I mean, this NC State University report is also, 25:11it reminds me of like, there's a really old 25:13one from DEF CON a number of years ago, where 25:15it listens to the sound your keyboard makes in 25:18order to extract what it is that you're typing, 25:20which I think is like really fascinating. 25:22I guess Volkmar, you know, working on AI 25:24infrastructure, you know, one reaction is 25:26like, this is cool, but like, is it up to date? 25:28Is it a practical attack? 25:29Is it a security surface that 25:30we need to be worried about? 25:31Because, you know, who's going to stand outside 25:33the data center, you know, do this monitoring? 25:35I think there are two questions. 25:36So one is like, you know, 25:38what's the value of model? 25:39Is the model structure in itself valuable? 25:42I don't think that like, This is the one thing 25:44which, you know, everybody understands the math. 25:47I think there's not huge amounts of gain 25:49right now in looking at the model structure. 25:52Uh, I think the, the true 25:53value are the numbers, right? 25:55So if you don't have the weights, 25:56then ultimately that's the 25:58information that turned into a model. 25:59That's the expensive part. 26:00The model structure itself is the cheap part. 26:03I think that There is, in computer science, 26:06over the last, you know, like, I don't 26:08know, five decades, we, we figured out how 26:11to secure, like, computer infrastructure. 26:13So you have the same thing, you know, 26:15extracting databases and database 26:17content, uh, out of computers. 26:20Uh, we have, you know, cryptography now on 26:23pretty much every link in the computer system, 26:25so that you can even run it in the cloud, 26:27and so there's a whole confidential compute. 26:30Um, Uh, like, movement where, you know, 26:32the, the PCIe link from the host to the 26:36GPU is fully encrypted, so that you cannot 26:39intercept anything which travels over it. 26:41Um, and so I think we have very standard, Um, 26:45like defense mechanisms against just, you know, 26:48brute force attacks on the physical hardware. 26:51Um, I think where we are, I think, a little bit 26:54less affair today is how models get exchanged. 26:57And so the model in itself, in particular, 26:59if you look from a proprietary perspective, 27:02contain your business information, right? 27:04So let's say you fine tune it and you take your 27:07proprietary data and you stick it in the model. 27:09You don't want to, like, You wouldn't 27:11give your database to your competitors. 27:12And so you wouldn't want to give 27:13your model, which knows everything 27:15in your database, to your competitor. 27:17And so we are still in a world right 27:19now, I think, where we haven't really 27:21figured out what the end to end process 27:24of confidentiality around models is. 27:27And so there are pieces missing in 27:29the infrastructure, just like how 27:31we evolved to where we are today. 27:34Like, you know, can I, have fully 27:36encrypted models which only get 27:37decrypted, for example, inside of the GPU. 27:40Can I have, uh, so this is the weights, 27:43this may be the code, if I want to make that 27:45confidential, and then it also means like, 27:47you know, the stimulus or the inputs and the 27:48outputs, are they making it, you know, over the 27:51wire in an encrypted or unencrypted fashion? 27:53So we have the mechanisms, the problem 27:56is that they are not pervasively, uh, 27:59deployed or available in hardware or used. 28:02Right. And so I think they're over the next 28:04couple of years, I think we will see much 28:07more, um, like effort being put in to make 28:11sure that, you know, we actually protect 28:13the asset in a much more stringent way. 28:15I mean, we have data at rest, everything's 28:17encrypted, data in flight, everything's 28:19encrypted, and here comes the model, right? 28:20And it's like, Good luck. 28:23Yeah, it's really interesting. 28:24I mean, I mean, if I, if I'm hearing you 28:26right, it feels like a little bit like 28:28as true with many things in security. 28:29It's like we have the techniques. 28:31Now the question is, can we get 28:32like uniform adoption in a way 28:33that actually offers security here? 28:35And I think it's like a whole ecosystem 28:37of software just has been written and 28:39it's kind of this ways to the market. 28:42And now we are like, Oh my God, you 28:43know, like All that proprietary stuff, 28:46you know, we are like wide open. 28:48Uh, and so I think we will 28:49actually start locking things down. 28:51And, you know, like from an enterprise 28:53perspective, what we do at IBM, this is 28:54like, you know, a good chunk of what's next 28:56is, is like protecting your data assets, 28:59protecting your model assets, making sure, 29:00you know, that you are actually compliant. 29:03And I think that whole workflow of building 29:06these things, Um, you know, it's it has 29:08been an afterthought because you know, we 29:10just download the internet and train it in 29:11a model but I think we are now getting into 29:14into a world where um, you know, there are 29:17really Uh expensive assets which you you 29:20must protect and so like everything which 29:23follows in the in the enterprise Uh will 29:25effectively be employed or employed with AI, 29:29I think it's a totally I think it's like, 29:31and it actually ties to what we were 29:32talking about a little bit earlier, right? 29:33If like the trend is we sort of run 29:34out of all the common data, right? 29:37Like there's also just like more proprietary 29:38stuff that becomes a component of how 29:40we go about doing this in a way that's 29:42like, raises the risks here, right? 29:44Actually raises the incentive. 29:45Abraham, maybe I can ask you is like, 29:47you know, so there are these kind of 29:48interesting security questions about you 29:51know, model exfiltration and all that. 29:53And, you know, granite, of course, it's like an open model. 29:55Um, but I'm kind of curious about how you 29:56guys think about security on a release like 3.1. 30:00Um, you know, do, is there a separate 30:01team that does that analysis? 30:02Just kind of interested in hearing a little bit 30:03more about how you guys think about that, uh, 30:05in the context of like an open source launch. 30:07Yeah. 30:08Um, so we think about, so to answer your 30:11question, yeah, there's a, there's a dedicated 30:13team led by Ian Malloy focused on security. 30:17Um, and they do a lot of work in terms of 30:19being able to better under, Identify like 30:21vulnerabilities in the model, um, there's 30:24also a wide swath of safety and red teaming that 30:28we do more so from the safety perspective, 30:30ensuring that our models don't harm. 30:33Um, but one of the big things we've seen 30:35in terms of, you know, security, at least, 30:36you know, from from our cyber security team 30:39is that, um, models are most vulnerable 30:41because or ways we found models or models 30:44are vulnerable is that in the training 30:46data, we may be able to, you know, train on. 30:49Um, For instance, a large corpus of a 30:52language that's not in our intended use case. 30:55So if a user was to, you know, prompt our model 30:59on that particular use case, or that particular 31:01language, given that that's not in the scope 31:03of our security framework, it makes it a little 31:06bit more vulnerable to either jailbreak or to, 31:10you know, get a response that maybe not is in 31:12scope of what the model should be used for. 31:15From the infrastructure standpoint, I'll be. 31:17Totally transparent. 31:18This is kind of out of my purview. 31:21In reading the article, it was just 31:22really interesting to see that, you 31:24know, one, this was models on the edge. 31:27So, things that are, you know, TPUs 31:31are not typically used to inference, 31:33you know, larger scale LLMs. 31:36Also, it was, What I found interesting 31:38was this was a vulnerability specific 31:40to a, uh, infrastructure provider. 31:43It wasn't necessarily an external 31:44attack that was successful. 31:46So I think it just brought up the question 31:48in terms of, is this something where model 31:51infrastructure providers need to provide 31:53more robust, um, you know, guardrails around 31:56how potential bad actors inside the company. 31:59Uh, could potentially, could, you know, 32:00infiltrate their models, um, and then just to, 32:03you know, the point that Balkmar made, um, a 32:05lot of the models, you know, the infrastructure, 32:07the architecture of the model, i. 32:09e., you know, the layers and, and, and other, 32:12uh, aspects of the model, they're usually 32:15provided as part of the open source community. 32:17So a lot of these things are 32:18on hand for people to use. 32:19So in terms of whether this is a risk that, 32:22you know, we need to kind of dive into full 32:24scope, I can't say yes or I can't say no, 32:27but I think my take from it was, I think 32:30it opened up a little bit more of, uh, the 32:32question of how our infrastructure providers 32:34ensuring that bad actors inside the company 32:37aren't, you know, trying to infiltrate any 32:40of the models that are on, that they serve. 32:42That's right. 32:42Yeah. And I think it goes to a little bit of 32:43what Volkmar was talking a little bit 32:45about, which is, um, you know, a number of 32:47these security issues are known security 32:49issues for just the infrastructure, even 32:50before you get to AI infrastructure. 32:52And so, you know, the insider threat problem 32:54is big in any case, and it's kind of just 32:56like, well, how much of this kind of folds 32:57into kind of traditional security work? 33:00I think this is one of the reasons I'm 33:01asking Abraham is I'm really interested 33:03in kind of how Like, an organization 33:05decides to deal with safety and security 33:07on these models is really interesting. 33:08Who's responsible for bits of this? 33:10And I think it's assigned in very 33:11different parts of the organization. 33:12It's like, very interesting. 33:14And it's new. 33:14These are things that, you know, we, 33:16like, as much as AI has proliferated 33:18over the last couple of years, it's only, 33:20you know, attention is all you need. 33:22It only came out, you know, seven years ago. 33:24Actually, we came out 33:26three and a half years ago. 33:27So, we're still trying to 33:28figure out a lot of things. 33:29That's right. 33:30I mean, one of the things I love to say 33:31on MOE is, in the old days, by that I 33:33mean like five years ago, six months ago. 33:35Yeah, that's right. 33:41So I'm going to move us on to our final 33:42topic for the day, which is Jetson. 33:45Um, background on all this is that 33:46NVIDIA announced a small, dare I 33:49say, cute little board supercomputer 33:52called Jetson, uh, for AI developers. 33:54Um, unlike the, uh, uh, eye watering price of 33:57an H100 or what will be the price of the GB200. 34:01Um, this retails for a relatively 34:03cheap price of 250 bucks. 34:05Um, and it's this kind of handheld little board. 34:08Um, and I guess Volkmar, 34:09maybe I'll turn it to you. 34:10It's like, you know, Why is NVIDIA 34:13getting into this business, like 34:14this kind of hobbyist GPU business? 34:17Um, and do you think it actually kind of like 34:18matters for the overall AI market at all? 34:20Or is this kind of a little bit just 34:21like, Jensen wanted to do a fun thing and 34:23is releasing like a little board, just 34:24because it's a fun end of year thing to do? 34:26Yeah, so this is a, is a continuation of 34:30something NVIDIA invested for many, many years. 34:34Um, and so before the ChatGPT craze. 34:38Um, like there was a massive investment 34:41of NVIDIA into autonomous vehicles. 34:45Um, and so coming from that industry, you 34:47know, we've been tracking that board pretty 34:50much, I mean, it came out, I think, 2015, 34:5316 ballpark ish, um, and has been like 34:56continuously receiving updates from NVIDIA. 35:00So this is really, um, a board which is You 35:04know, for low power robotics, it's there is 35:08a version which has lots and lots of cameras 35:11being able to be attached so that you can 35:13actually put it on a, on a robot, and it's, um, 35:17it's cheap because, you know, it's made for. 35:20for scale. 35:20It's not, you know, running large language 35:22models, more like vision processing, you 35:24know, planning, uh, like, uh, mapping. 35:27So it has a bunch of on, on processes 35:29on it, and it has a bunch of, 35:31uh, has a effect in NVIDIA GPU. 35:33So I think the, the, the main benefit over All 35:37the other solutions on the market is that in 35:40many cases today you'll train your models on 35:43an nvidia card So you're living in an ecosystem 35:47and then you can just move that ecosystem in 35:49production where you only run the inferencing 35:51part And it's powerful enough that you can 35:54actually you know, you do camera processing 35:56the video encoders, etc on on these On that 35:59one chip, uh, it's a pretty low power solution. 36:02So you don't drain batteries. 36:03So you don't want to put an H100 on a robot 36:05simply because, you know, your robot drives 36:07like five meters and it's out of power. 36:09Um, and so it's, it's more of an 36:11embedded system on a chip thing. 36:13So I think the nice part of it is it's, you 36:17know, now it used to be like around 500 bucks. 36:19Now it's down to, uh, you know. 36:21250 or so. 36:23It's, it's really affordable for, you 36:25know, hobbyists, but also like if you 36:27want to build something at scale and you 36:29get a chip, uh, you know, for 250 bucks, 36:32you effectively can build a robot now. 36:34So like, at least from the electronics 36:36part, I mean, the robot's missing, right? 36:38Yeah. Right. 36:39You at least got the board. 36:40The robot is up to you. 36:41So the, the, the automotive part of that 36:44is a bit more involved and it has like 36:46dual chips and has many more cameras and it 36:48has, you know, uh, ADA is ADA's compliant. 36:51And that's one, you know, it's 36:52kind of the baby chip of that. 36:53Vagner, maybe I'll turn to you on this is, 36:55you know, we've talked a lot about, you 36:57know, models getting smaller, but when we've 36:59talked about kind of like system on a chip 37:01and we've talked about kind of like, you 37:02know, AI at the edge, it's often been in 37:05the context of like a mobile phone, right? 37:06Like we've talked about like 37:07Apple doing a new release. 37:09One of the things I think that's like 37:10pretty fun about this is that it's kind 37:11of like it's offered as just like, or it's 37:13marketed certainly as kind of like a tool for 37:15hobbyists, you know, student groups that want 37:17to build robots or they want to do their own 37:19inference, like for their own experiments. 37:22Um, that ecosystem I think 37:23is really interesting. 37:24And I guess I got kind of question 37:26for you is like, how far you think 37:27that's going to go over time? 37:29Um, you know, maybe the last thing I'll 37:30throw in as I've been watching this 37:32project by George Hotz for some time, 37:33this tiny corp project, where he wants 37:36to basically offer everybody a petaflop. 37:38Um, and I think one of the questions is just 37:40like, does it become cheap enough that everybody 37:42ends up having like a little GPU rig at home? 37:44Um, you know, I'm kind of curious about how 37:46you think about like that, because it starts to 37:47look quite different from how we do AI nowadays. 37:50Yeah, this, uh, accessibility in terms of 37:52value, I think that it's the, the, More, uh, 37:55the most important aspect in my understanding, 37:58even that if we think about, uh, developing 38:00countries, for instance, in Brazil, uh, 250 38:04is like the minimum wage, a monthly salary, 38:06so it's not that cheap in developing some 38:09developing countries, but even though if we 38:11consider that this is, uh, The cheapest right 38:14now, I think that opens possibilities and 38:17a lot of possibilities in those countries 38:20where people have a lot of creativity. 38:23And, and, uh, the, the, this kind of, 38:27of harder, um, may allow them to think 38:30about, uh, like robots for architecture, 38:32uh, agriculture or, Other, uh, interesting 38:35uses that, uh, the cost before this, this 38:39specific, uh, hardware, uh, was a, a blocker. 38:42Yeah, that's right. 38:43Yeah, the kind of continual 38:44democratization is interesting. 38:46Yeah, definitely for international as well. 38:48I mean, I think that's a really big 38:49component of this is who gets to be 38:51able to tinker with some of these tools, 38:53um, seems really, really important. 38:55I guess, Abraham, do you, I don't know, 38:56are you an AI hobbyist in your free time? 38:58I'm kind of curious if you would 38:59like buy something like this, play 39:00around with these types of tools. 39:02Uh, I mean, it depends on 39:04who's asking the question. 39:06Uh, yeah, I definitely, in the old 39:10world, I definitely played a lot with it. 39:12Um, I'm not as, you know, technically 39:14sound as some of the other team 39:16members here at IBM Research. 39:17But I find myself playing with 39:19the agentic framework quite a bit. 39:21Um, when Baby AGI came out a couple years ago, 39:24I found myself really kind of diving into it. 39:26So, I don't know if I'd buy this per se, but, 39:29um, I think just like the AI development is 39:33being abstracted to a level where it's a lot 39:34of plug and play nowadays, so you get a lot 39:37of, you know, the skill set of a developer, 39:41although for like core LLM development may 39:43still be, like, you know, sound, uh, but in 39:45terms of just a hobbyist, I think that skill 39:47set, you know, can be a little bit lower, 39:50um, and I, I think this is kind of a pretty 39:53cool tool to, to, to help that You know, side 39:56of the fence in terms of, you know, creating 39:59more models that you can run on the edge. 40:01Yeah, absolutely. 40:02I think there's going to be 40:02just a ton of activity there. 40:04Well, great. 40:05Well, thanks everybody for their time today. 40:06Uh, Abraham, welcome to the show. 40:08Uh, hoping to have you on for a future episode. 40:10Uh, Vagner Volkmar, always a pleasure 40:12to have you on the show as well. 40:13Thanks for joining us. 40:14If you enjoyed what you heard, you 40:15can get us on Apple podcasts, Spotify 40:18and podcast platforms everywhere. 40:20Uh, and listeners, we'll see you next week.