Learning Library

← Back to Library

90% of Enterprise Data Unstructured

Key Points

  • The panel humorously debated how much enterprise data is unstructured, with guesses ranging from 40% to a tongue‑in‑cheek 200%, before revealing that roughly 90% of enterprise data is actually unstructured.
  • This episode marks the 50th installment of the “Mixture of Experts” podcast, featuring discussions on the upcoming Llama 4 release, highlights from Google Cloud Next, and recent Pew Research findings.
  • IBM Fellow Hillery Hunter introduced the newly launched IBM z mainframe, emphasizing its “zero downtime” design that achieves eight‑nine (99.999999%) reliability, translating to only a few hundred milliseconds of downtime per year.
  • Hillery explained that mainframes like IBM z underpin the global economy by handling the vast majority of financial transaction processing, making them critical yet invisible infrastructure for everyday banking and market activities.

Sections

Full Transcript

# 90% of Enterprise Data Unstructured **Source:** [https://www.youtube.com/watch?v=90fUR1PQgt4](https://www.youtube.com/watch?v=90fUR1PQgt4) **Duration:** 00:37:20 ## Summary - The panel humorously debated how much enterprise data is unstructured, with guesses ranging from 40% to a tongue‑in‑cheek 200%, before revealing that roughly 90% of enterprise data is actually unstructured. - This episode marks the 50th installment of the “Mixture of Experts” podcast, featuring discussions on the upcoming Llama 4 release, highlights from Google Cloud Next, and recent Pew Research findings. - IBM Fellow Hillery Hunter introduced the newly launched IBM z mainframe, emphasizing its “zero downtime” design that achieves eight‑nine (99.999999%) reliability, translating to only a few hundred milliseconds of downtime per year. - Hillery explained that mainframes like IBM z underpin the global economy by handling the vast majority of financial transaction processing, making them critical yet invisible infrastructure for everyday banking and market activities. ## Sections - [00:00:00](https://www.youtube.com/watch?v=90fUR1PQgt4&t=0s) **Estimating Enterprise Unstructured Data** - Panelists humorously guess the share of unstructured data in enterprises, ultimately revealing it’s roughly 90%. - [00:03:02](https://www.youtube.com/watch?v=90fUR1PQgt4&t=182s) **AI-Powered Real-Time Transaction Fraud** - The speaker explains how ultra‑fast, highly reliable AI is embedded into transaction systems to score billions of events per day, enabling instant fraud detection at the point of sale. - [00:06:15](https://www.youtube.com/watch?v=90fUR1PQgt4&t=375s) **Bringing AI Inference Close to Transactions** - The speaker explains why large banks need to deploy fine‑tuned language models at the edge to achieve sub‑millisecond inference for mission‑critical tasks like fraud detection, avoiding cloud latency and security concerns. - [00:09:18](https://www.youtube.com/watch?v=90fUR1PQgt4&t=558s) **Enterprise AI Expands to Core Use Cases** - The speaker celebrates recent breakthroughs in applying LLMs to enterprise workloads, overcoming latency hurdles, and previews upcoming Spyre AI enhancements for self‑healing, automated systems. - [00:12:27](https://www.youtube.com/watch?v=90fUR1PQgt4&t=747s) **Open‑Source Giant Models Accelerate** - The speaker highlights the debut of huge open‑source language models—ranging from a 400‑billion‑parameter mixture‑of‑experts system to a 2‑trillion‑parameter model—arguing they pressure closed‑source labs and could broaden community support for mixture‑of‑experts architectures. - [00:15:32](https://www.youtube.com/watch?v=90fUR1PQgt4&t=932s) **Beyond the Race: Llama 4 Strategy** - The speakers contend that the significance of Meta’s Llama 4 release lies not in a head‑to‑head leaderboard but in its illustration of shifting open‑source tactics and its massive industry influence, highlighted by over a billion model downloads. - [00:18:40](https://www.youtube.com/watch?v=90fUR1PQgt4&t=1120s) **Evaluating Mega-Scale Open-Source Models** - The speakers debate whether massive models like the unreleased “Behemoth” are practical and open‑source viable versus being merely marketing hype, highlighting community innovation, cost‑performance trade‑offs, and real‑world deployment challenges. - [00:21:44](https://www.youtube.com/watch?v=90fUR1PQgt4&t=1304s) **Scaling Down: Leveraging Giant LLMs** - The speaker predicts that upcoming massive Llama 4 models will mainly be used to generate and augment enterprise data, which can then be distilled into much smaller (1‑10 billion‑parameter) fine‑tuned models for laptop‑scale deployment. - [00:24:59](https://www.youtube.com/watch?v=90fUR1PQgt4&t=1499s) **Google Cloud Next AI Breakthroughs** - The speaker recaps Google Cloud Next’s showcase of rapid enterprise growth, AI integration through TPUs and Gemini, the new Ironwood chips, and the opening of Google’s massive fiber network to customers. - [00:28:00](https://www.youtube.com/watch?v=90fUR1PQgt4&t=1680s) **Google’s Peer Agent‑to‑Agent Protocol** - The speaker outlines Google’s new agent‑to‑agent standard that enables LLM agents to interact as equal peers, integrates with Anthropic’s MCP and IBM’s consulting workflow, and emphasizes Gemini 2.5 Pro’s benchmark leadership and focus on safety. - [00:31:03](https://www.youtube.com/watch?v=90fUR1PQgt4&t=1863s) **Google’s Gemini 2.5 Advances** - The speaker lauds the Gemini 2.5 Pro release, highlights Google’s unique ability to train models on vast B2C data for cinematic video generation, and expresses excitement about the model’s impact on the field. - [00:34:08](https://www.youtube.com/watch?v=90fUR1PQgt4&t=2048s) **Personalized Video AI and Public Perception** - The speaker envisions AI‑driven, celebrity‑filled personalized movies and cites a Pew Research report showing a sharp gap between experts who downplay AI’s job impact and the public who feel threatened by it. - [00:37:14](https://www.youtube.com/watch?v=90fUR1PQgt4&t=2234s) **Upcoming Episode: Mixture Experts** - The host signs off, promising that the next installment will cover the Mixture of Experts subject. ## Full Transcript
0:00What percentage of enterprise 0:01data is unstructured data? 0:04Kate Soule is Director of Technical 0:05Product Management for Granite. 0:07Uh, Kate, welcome back to the show. 0:08What's your estimate? 0:09This feels like a trap, uh, without, 0:12you know, just a, a wild guess. 0:13I'm gonna say 40%. 0:15Shobhit Varshney is Head of 0:16Data and AI for the Americas. 0:18Uh, Shobhit tuning in live from Vegas. 0:20Uh, what do you think? 0:21200%. Have you seen the quality 0:22of structured data in companies? 0:25All right, great. 0:25And last but not least in joining us for 0:27the very first time is Hillery Hunter, IBM 0:29fellow, and CTO of IBM Infrastructure uh, 0:32you've got an advantage on this question, but 0:33I don't know if you wanna offer your guess. 0:35Yeah, I'll, I'll take the midpoint there. 0:37Uh, not exactly the midpoint, 0:38but uh, I'll go with 80%. 0:40Okay, great. 0:40So the answer is 90%. 0:42Uh, we're gonna talk about that 0:43today and all that, and more on the. 0:45Very 50th episode of Mixture 0:47of Experts 50th episode. 0:49Crazy and welcome for Woo-hoo. 0:50Yeah. Woo-hoo. 0:56I'm Tim Hwang and welcome to Mixture of Experts. 0:58Each week, MoE brings together a talented 1:01and just lovely group of researchers, product 1:03leaders, and more to discuss and debate the 1:05week's top headlines in artificial intelligence. 1:08As always, there's a ton to cover. 1:10We're gonna talk about the Llama 1:114 release, Shobhit's in Vegas. 1:13He's gonna tell us all about Google Cloud Next. 1:16Some really super interesting 1:17research coming outta Pew Research. 1:18Uh, but today, uh, we want to take the 1:21opportunity because Hillery is on the line with 1:22us, uh, to talk about IBM z, which is a new 1:26launch that just came out on I believe Tuesday. 1:29Um, and it concerns mainframes. 1:31Uh, and so I guess, Hillery do you wanna 1:32just start for listeners who are less 1:34familiar with the sector, what is a 1:36mainframe anyways and why is it important? 1:38Yeah, I'm, I think first fun fact is "z" 1:40stands for zero downtime and mathematically, 1:43that's kind of an interesting conversation. 1:45We talk about the system now having eight 1:47nines of reliability and the way that 1:49you, you count those nines, as you say, 1:51it's 99 point and then six more nines. 1:54So that's how you get to, it's a lot of nines. 1:56Nines resiliency. 1:56Yeah. 1:57But it means just a couple hundred 1:58milliseconds, a year of downtime on average. 2:01And so, you know, when I talk to family 2:03members or I, I meet someone socially. 2:06I kind of say we work on building the computers 2:09that you don't see and that you just sort 2:10of assume are there and never think about. 2:13And what that means is really, this 2:15is where most of the world's financial 2:17transaction volume, everything from things 2:19in the market to your personal credit card 2:22transactions go through it in the back end. 2:24And you hopefully never think about whether 2:26or not that computer's gonna work or your 2:28credit card transaction goes through. 2:30These are these systems that we 2:31just all assume are up all the time. 2:33And so it's really kind of at the core of. 2:35The global economy, to be honest, 2:36that's really not an exaggeration. 2:38Yeah. What I love about this is like you work 2:40on like arguably the hi, some of the 2:42highest of highest stakes computing. 2:44Um, and I think one of the most interesting 2:46things about the launch is that I know AI is 2:49a, a, a big part of this launch in some ways. 2:51Um, I know there's sort of z17 which is 2:53the mainframe, and then there's "z" sort 2:55of the software, which sounds like kind 2:57of IBM pushing into the idea that these, 3:00you know, 6 9s of reliability computers. 3:02Really are gonna get, you know, sort 3:04of integrated into the overall sort 3:06of AI revolution, which, you know, 3:07we talked about on the show before. 3:08AI, you know, is, is is not 3:10always kind of like production. 3:12It, it like sometimes, you know, messes up, 3:14it's stochastic, it has all sorts of randomness. 3:16So curious to hear a little bit more about 3:18like what's getting launched on the software 3:20side and I guess how you kind of like get 3:22AI to work at like such a high level of 3:24reliability that I think most software 3:25developers never even need to think about 3:27as they're kind of vibe coding or whatever. 3:29Yeah, it's, it's a pretty different space, but 3:31it's equally fascinating, I think to that whole 3:34vibe coding kind of space that a lot of folks 3:36are interacting with now on a daily basis. 3:38Um, from a technical perspective, getting things 3:41done in transaction means having millisecond 3:44level AI and that means super, super fast, 3:47tightly integrated, being able to handle  billions of transactions a day, um, and 3:54being able to score things at line speed, right? 3:57So. 3:58Again, anecdotal sort of example, 4:00if you're talking about fraud 4:01and analytics in the credit card 4:04transaction processing space. 4:06If I as a consumer am buying 4:08something online, it's okay. 4:10There's minutes to hours before the thing gets 4:12shipped out, you know, so fraud can happen 4:14offline, but if it's in a store and somebody's 4:16trying to rip you off and buy an expensive 4:18phone or something like that, at Best Buy, 4:20you wanna make sure that instantaneously, 4:22the moment the transaction goes through, 4:24that it's detected as being fraudulent. 4:26And so there's actual real economic 4:27value and consumer value to being able 4:29to score every transaction in real time. 4:32The interesting thing that we're now 4:34talking about being possible on this next 4:36generation of mainframe is multi-model AI. 4:39So a really small, fast compact model that's 4:42running there, right on the processor, dealing 4:43with this massive transaction throughput. 4:46Maybe occasionally it has low confidence in 4:48the scoring it provided, and it needs to be 4:50backed up by a bit more robust, complicated 4:52model, and so we're putting extra AI cards 4:55called the Spyre card into the system to 4:57enhance not just being able to do that super 5:00fast processing on the processor itself. 5:02But also do fast processing, one 5:04step slightly removed and adjacent 5:06on a PCIe attached set of cards. 5:09And so we've just multiplied the AI 5:11capacity, um, and throughput for the system. 5:14And also then from the perspective 5:16of then the total system experience 5:18on the software side, like you said. 5:19We now something called Operations Unite, 5:22which is AIOps driven AI chat driven interface 5:26to everything going on in the system. 5:27So observing, remediating issues, all 5:30happening in a totally modern interface. 5:33So it's pervasive once you 5:34put the AI capability in. 5:35It's not just about the workloads running in 5:37the system, but also how people use and operate 5:39and keep the whole thing stable and healthy. 5:41Yeah, that's awesome. 5:42So Shobhit I'd love to bring you in. 5:44I, I know I launched this episode 5:45with a question about just. 5:46How much unstructured data, uh, enterprises 5:49are sitting on, and I'm sure this is a problem 5:50that you have to deal with and that you 5:52talk about with customers day in, day out. 5:54Uh, I know that's a component of this launch, 5:56but curious if you want to just opine a 5:58little bit on kind of how the world is 5:59evolving there and I guess how the Z launch 6:01sort of fits into some of those questions. 6:03Uh, 6:03a big fan of, uh, of, of the Z Series and I 6:06grew up in a cloud, first AI, first world, and 6:09I have so much respect for understanding the 6:11right balance between where mainframe should 6:13be playing versus where the clouds are, right? 6:15So as an example, working with a very, 6:16very large bank where we leveraging cloud 6:19environments with a lot of different GPUs 6:22and compute behind it to train the models. 6:24But once you have fine tuned the models 6:25to enterprise data, you wanna go bring 6:27it where the transactions are happening. 6:29And these are sub milliseconds, right? 6:31Very, very quickly. 6:32You're having doing this, and you're doing 6:34billions and billions of these every hour. 6:36So you want to bring the AI 6:38inference as close as possible to 6:39where the transaction is happening. 6:41In the first wave of doing unstructured 6:44content analysis, you would have some 6:45large language model that summarizes 6:47a call recording or starts to do some 6:49knowledge search and things of that nature. 6:51Now, in the next wave, once we've proven out 6:53that this technology is working, you wanna 6:55do this in more mission critical workflows. 6:57For example, when fraud detection happens, 6:59like Hillery was mentioning, there's a lot 7:01of, uh, patterns that we need to look for. 7:03It's not just that one 7:04transaction that happened. 7:05You also need to look for 7:06how that transaction was. 7:07Was, uh. 7:09At that point of the transaction 7:11happening in sub milliseconds, the 7:13larger models have a lot of latency. 7:16You can obviously not afford to have 7:17that, that data go out to the cloud 7:19and come back A, security issues and B, 7:21the latency and, and, and other things. 7:23Right? 7:24So you, you, we are in the world where we see. 7:26A lot of our larger fortune hundred 7:29companies move from experimenting with 7:32large, uh, frontier models that are API 7:34calls to then fine tuning smaller open 7:37models and bring them close to the compute. 7:39So I think the Z series works 7:40incredibly in this space. 7:42And we also have the brand permission 7:43with Z. They're like, what, Hillery What? 7:4690% of all credit card transactions 7:48happen on Z and 90% of the Fortune 7:5050 banks rely on us and whatnot. 7:52Airlines, retailers. 7:55So you're on the mission critical workflows. 7:56This is no longer, Hey, let me ask 7:58the prompt a different way, right? 8:00So you're not experimenting, you are 8:01doing this in, in more critical workflows. 8:03You know, I love that you went to latency. 8:05I think one of the things related as well 8:06to that whole leaving the system is the 8:09data security model, data sovereignty, 8:11all those other really hot topics. 8:13And so I think also bringing AI to where 8:15that data is and where that mission 8:16critical data is, where that valuable and 8:18sensitive consumer and personal information 8:20is, is a big part of this conversation. 8:22I, I think one other thing, in 8:23addition, again to latency and then 8:25that data protection is also the energy. 8:27So we've greatly increased the AI capability and 8:30the overall capability of the system, but drop 8:33this whole system generation to generation by 8:3617% in the power consumption, and the team has 8:38measured that it's about five x more efficient 8:41to do that AI in place where the data is. 8:43Then, to your point, calling 8:44out to some external system. 8:46So these days everybody's running outta 8:48power, looking to take out more data 8:50centers base, all that other kind of stuff, 8:51and being able to do AI so efficiently, 8:54I think is a, is a really exciting step 8:56forward. 8:56And Hillery just, just about a month back, 8:58I was with one of the largest top three 9:00credit card companies and we were having 9:01this, uh, concern around fraud detection 9:03and said, uh, we can obviously do a lot 9:05of LLM work to understand patterns, right? 9:08It's not just a spot in time. 9:10And even a month back, we struggled 9:12to bring models that are LLM models. 9:15In real time transactions 'cause it 9:16just sub, sub millisecond and stuff. 9:18And I was just so proud that in the 9:20last week we, this week we've been able 9:22to go after those use cases that we 9:24couldn't even, even a few weeks back. 9:26Right. 9:26So we are coming to a point where clients 9:28understand that they've proven it out 9:29inside of their enterprises that we 9:31can use LLMs and we've trained them 9:33in a particular way, but latency was 9:35coming in the way of us doing this work. 9:36A lot of our clients are just huge 9:38kudo to your team to doing this right. 9:40I think you bring enough AI and, and to 9:42your point, the creativity just explodes. 9:43Every developer in kind of this core of the 9:45enterprise space is now, oh, that's now for me. 9:47That's not something for people 9:49elsewhere in different environments. 9:50It's now insurance claims processing, 9:52even medical image assessment. 9:53There's all kinds of amazing 9:55things going on on that core data. 9:57'cause AI is also for those people and 9:59for that data and for that context also. 10:01That's super exciting. 10:02So Hillery, before we move on to the next 10:04topic, what, uh, what comes next for you all? 10:06Yeah, so the capabilities with 10:08Spyre come out in 4th quarter. 10:10There's rolling set of announcements on the 10:12different software enhancements, and I think 10:14the way to think about it is we're making these 10:16systems AI through and through, like I kind 10:18of mentioned, you know, starting back even 10:20in z/OS 3.1, the last release there was AI 10:23inside things starting to look in that direction 10:25of self-healing or, or sort of automation of 10:28management, of the efficiency of the system. 10:30Uh, what we've stated about z/OS 3.2, which is 10:33gonna be coming out is, is even more integration 10:35of that smartness into the core of and the 10:37heart of how the system operates, and then how 10:40operations teams experience it and going all 10:43the way out even into our support staffing. 10:45So. 10:46If you call IBM for help with something, now we 10:49are also using watsonx technology to help those 10:52agents who are helping you with your mainframe. 10:54So that's a project that we started with 10:56in our technology lifecycle services 10:58organization with our storage products. 11:00And we're, you know, we've announced 11:01now this week that we're also 11:02bringing that to mainframe support. 11:04So that whole experience end to end, 11:06how the system runs, what you can do on 11:08it, what you understand about it, and 11:10then how somebody helps support you is. 11:12All gonna be AI enabled. 11:13And I think that end-to-end in full 11:15stack story is, is just really exciting. 11:18This is us living what we've been 11:20talking about with the power of AI. 11:21This is awesome. 11:22Yeah. So we'd love to have you back on 11:23the show as things unfold here. 11:24I think it's a, like a segment of AI that 11:26we haven't talked as much about, but I, 11:28I love it personally just 'cause it is 11:29like this kind of very high stakes thing. 11:31You really gotta get it right in these domains. 11:33And so, um, you know, it's a kind of 11:35AI, almost engineering that you don't 11:37really see in a whole lot of other. 11:38Which is really exciting. 11:44So I'm gonna move us on to our next topic. 11:46Uh, Meta has released LLlama 4 a long 11:50awaited release in the open source space. 11:52Um, there's three models 11:53that they've talked about. 11:54Two of them actually announced, 11:56uh, the Scout model, the Maverick 11:57model, and the Behemoth model. 11:59Um, and it follows in a pattern that we've seen. 12:02Elsewhere in the open source space 12:03where people are launching both smaller 12:05models and bigger models to meet a 12:07variety of different applications. 12:09Um, Kate, maybe I'll start with you. 12:10I don't know if you had a chance to kind of 12:11play with some of the models yet, but curious 12:13about your early impressions, your vibe, 12:14check, uh, on, on how this release went. 12:17Yeah. 12:17Uh, you know, it's been a busy week, so I 12:18haven't had a chance to to play with them 12:20directly, but it's really exciting to, I've 12:22been reading up on them, uh, certainly, and it's 12:24really exciting to see what Meta put out there. 12:27Uh, I mean, with the release of their 12:29largest model, which is, uh, you know, 12:31over 400 billion parameters, I believe, 12:33mixture of experts and a hundred billion 12:35parameters, I think is the scout. 12:37Uh, they're really starting to 12:39take on larger and larger tasks and 12:41create, you know, some powerful models 12:43out in the open source ecosystem. 12:45I think with the, uh, announcement of 12:47their B myth model, which is, you know. 12:492 trillion, uh, parameters. 12:51Uh, I think what they said said 12:53that's big, right? That's big. 12:54That's, that's pretty big, Tim. 12:56Um, so, you know, they're, they're 12:58talking about, you've already on earlier 13:01trained versions, checkpoints, uh, it's 13:03cracking GPT-4 0.5 on tasks like science. 13:06So they're clearly, you know, putting themselves 13:09out there as a frontier model provider. 13:12And doing that in the open, I 13:14think is only gonna continue to 13:15put more pressure on these closed. 13:18Labs to release some of their 13:19work out in the open as well, and 13:21more broadly help the community. 13:22So that, that's really interesting. 13:24Um, I think there is a lot to be 13:27said about the, uh, mixture of 13:28experts architecture that's going on. 13:31Uh, where we see, you know, obviously 13:34DeepSeek made this famous, uh, when they 13:36first released, uh, back in December or 13:38so, uh, with, not first released, but 13:41released a big update to their family. 13:43Um, it's. 13:43An architecture that's been used 13:45more broadly even before that. 13:47But I'm really hopeful that this release 13:48will help get broader community support 13:50behind mixture of expert architecture. 13:53'cause there's just tons of, uh, 13:54really interesting things about it. 13:56Very training efficient, um, 13:58inference efficient, particularly 14:00if run at a, a low batch size. 14:02So. 14:02You only have to use the experts that, that 14:05you need to call at inference time, which, you 14:07know, if you're just running, you know, one or 14:09two tasks, uh, can be run really efficiently. 14:12You start to lose a little bit of that if you 14:13have to run these at much larger batch sizes. 14:15'cause you have to load all 14:16your experts into memory. 14:17So most people don't quite realize 14:19that about mixture of experts. 14:21But either way, really 14:22excited to see just another 14:24power horse model get released, uh, 14:26in this case, two power horse models 14:27get released out into the open. 14:29Yeah, for sure. 14:29And if you can go into that a little 14:30bit more for some of our listeners. 14:32I mean, I miss namesake of the show, 14:34so I have to kind of fight for it, 14:35but it's like, has mixture of experts 14:37been a little bit uncool as of late? 14:39Like, I guess, is this kind of you, it 14:40sounds like what you're implying is sort 14:41of like these models might like make 14:43it like a focus of the community again 14:45in a way that it hasn't in the past. 14:46And I'm, I'm kind of curious about 14:47how that, how that's developed. 14:49Well, I mean even just with 14:50the the Z system, right? 14:51We're talking about the focus on inference 14:53efficiency, running things quickly 14:55at inference time, and a lot of what 14:57requires that, or what enables that is the 15:00community building open source software and 15:03platforms to be able to host and run these 15:05models as quickly and fast as possible. 15:07And just because the most popular open 15:09source models to date, including pre prior 15:11generations of Llama, have been dense. 15:13Architecture models, a lot of the existing 15:16support for hosting and running these models, 15:19running them locally, run, hosting them and 15:20running them yourselves on platforms like VLM 15:23are, you know, predominantly based on some 15:25of those more popular dense architectures. 15:27So there is going to need to be kind 15:28of a, a groundswell movement of the 15:30community continuing to build out support. 15:32I think we've seen a lot of that already with 15:33the release of Llama 4, and I'm just excited 15:36to get more open source developers interested. 15:38In mixture of experts as architecture 15:40as a whole and continue to build out 15:42toolings and, you know, ways that we 15:44can work with these models more broadly. 15:46Sure. But maybe I'll bring you in here a little bit. 15:48You know, I think that there's coff in a 15:49way this discussion goes, which I think is 15:51like less interesting, where it's basically 15:52like, okay, Meta did this release now, 15:54like, who's ahead, you know, in this race? 15:57But like, I think that's often like the 15:58wrong way to think about it, particularly 16:00as the space gets more and more complex. 16:01Yeah. Like how should we read into this? 16:04Sort of launch about what Meta strategy 16:06is and how it's trying to kind of like 16:08fill a, a niche in the market, right? 16:10Because I think rather than thinking about like, 16:11oh, DeepSeek is ahead, or Meta is ahead, I think 16:14we should just kinda ask the question of just 16:15like, how are the strategies sort of evolving? 16:17Absolutely. Yeah. 16:18I'm curious if you have some thoughts like 16:19what you read into this launch, basically. 16:21So let's just start by, by acknowledging 16:23what a consequential, uh, impact 16:26Llama has had on industry, the Llama 16:29models have been as of like 18th of March, 16:31they've been downloaded a billion times. 16:34Sure. Let, let's just let that sink in A billion. 16:35That's a lot times we've downloaded a 16:37model and made different versions of 16:39it adapted and this of that nature. 16:40Right. 16:41So a lot of enterprise that we work with, they 16:43are, we are very focused on how do I adapt a 16:46model to our enterprise specific domain, our 16:49data, and the way we want the models to behave. 16:51Right. 16:52That adaptation comes only when 16:54you're really, really open. 16:55There are certain, uh, frontier models 16:57that can be adapted fine tuning, but 16:59then you're leaving, you're sending 17:00your proprietary data to the cloud. 17:02That's a no go. 17:03So usually open models, open weight 17:06models are fine, uh, in that space 17:08where you can go and tune them to that. 17:10So our own Granite models, there's 17:12some models from Mistral and 17:14DeepSeek and others are also 17:16open weights, open models. 17:18But it takes quite a bit to create a good 17:20mechanism to assess the quality of an output. 17:24So for a lot of our clients, we have to go and 17:26gr build end-to-end LLM benchmarking mechanisms. 17:29How do you evaluate the output 17:31on your specific documents? 17:33So the. 17:34Benchmark results that are public. 17:36Those are a good starting point to 17:38get you a directional y check to 17:40say, yeah, it's worth looking at. 17:41'cause Llama 4 did X better, but 17:43none of my clients jump up and down 17:44saying that, oh my God, this is like 17:460.2 points higher than the other one. 17:48Right? 17:48People have other criteria that we use to 17:50judge which LLM uh, we should be leveraging. 17:53It starts by IP. 17:55Who can own the IP on that model. 17:57It starts with where the data gravity 17:59AI model follows the data gravity. 18:01It's actually commitments that you've made 18:03to specific vendor cloud vendors, right? 18:06There is things around can I adapt this 18:08to my own, uh, to my own environment? 18:10And then return on investment, the 18:12overall ROI of running these models. 18:14So you'll see a trend towards every six months. 18:16The next size smaller model gets 18:19smart enough to outcompete the 18:20previous one from six months back. 18:21So we're seeing this constant trend 18:23where we're getting really good 18:25power, like the performance to. 18:28The cost ratio, right? 18:30I think that's the sweet spot, and 18:31Llama has done a really good job. 18:33I would anticipate that we'll continue 18:34this trajectory of a billion downloads 18:36and we'll have different adapted versions 18:38of Llama available for our enterprises. 18:40That's the right frame to look at it 18:41versus, oh my God, this just crushed 18:43the numbers on this particular task. 18:46Then there is, uh, then there are 18:47other models that will constantly 18:48innovate with new methodologies. 18:50I think DeepSeek did a phenomenal job with, 18:52with some of the paperwork, our Granite models. 18:55We have some really nice tricks up 18:56our sleeves in our own models, and 18:58we give back to the community too. 18:59So I'm just super pumped about 19:00the community coming together. 19:03Open source, getting to a point you can adapt 19:05it to the enterprise and very, very focused 19:07on intelligence, uh, divided by the price 19:10and the what, and that kind of a metric. 19:12Hillery, maybe I'll bring you in, 19:13um, you know, just to talk a little 19:15bit about this Behemoth model. 19:16I know it wasn't released, but 19:18it is like shockingly large. 19:20Um, and, and it's cool on one 19:22level, you're like, wow, okay. 19:23It's like really, it's really big. 19:25I'm kind of curious though, like from your 19:27point of view, you know, the degree to 19:29which like these are actually kind of like. 19:31Practical models that a lot of people will 19:32use in the wild, 'cause it sort of feels 19:34like the kind of infra you need to pull off. 19:36Like really actually serving 19:37and using a model of the scale. 19:39Like there's part of me is like, is this 19:40just kind of a more of a marketing thing 19:42than it is actually like a practical reality. 19:43But curious about your take on, on this 19:45is like, is there room for open source 19:47on the like mega, mega, mega scale model? 19:50Just because it kind of almost like 19:51limits like the set of people who would 19:53actually practically end up using it. 19:54Yeah. 19:55I guess a, I have a lot of similar 19:56thoughts to what Shobhit just shared. 19:58Um, a couple of things, right? 19:59I mean, within IBM Infrastructure, we're also 20:01handling, creating the cloud infrastructure 20:03for watsonx and deployment of all these 20:06infra services and stuff like that. 20:07So the other part of my brain is, is looking 20:10at how do we bring, you know, more and 20:11more powerful accelerators of all kinds 20:13into that cloud environment to do whatever 20:15it is that watsonx needs to do, right? 20:17So if our customers are gonna 20:18need those really big models. 20:19I'm not gonna be the one that says No, we 20:21won't provide the infrastructure for it. 20:22Right? So we're advancing with NVIDIA and Intel 20:25and a MD and putting, you know, new and 20:27more GPUs out there to enable people 20:30to play around with models as large 20:33as they feel like are gonna be useful. 20:35I think on the practical side though, we 20:37see a lot of experimentation or attempts to use 20:40these things maybe from a teaching perspective. 20:43Um, but then when it comes to scaling out 20:45deployments, almost all of our customers 20:48then start to engage with us on how 20:49can I customize smaller things, right? 20:52So I feel like you sort of have to 20:54know where things are at on the large 20:55side and what it might do for you. 20:57You may use that to inform 20:59yourself on, you know, what the solution 21:01might look like or, uh, maybe create, 21:04um, you know, additional tuning data or 21:06something like that, you know, to get that 21:07characteristic that you need out of something 21:09that's then gonna be affordable to scale. 21:11So I continue like show bsu most 21:13of our customers saying, Hey. 21:15Um, you know, work largely 21:17in kind of the B2B space. 21:18As, as, as IBM we're working with other 21:20large enterprises who have millions 21:22to hundreds of millions of clients. 21:24And when you're wanting to engage with all 21:26of them and run at business scale of billions 21:29and hundreds of millions of things and 21:31people, um, the affordability very quickly 21:34kind of kicks in and people, you know, start 21:36looking at customization of smaller things 21:37for real scale out of, of deployments. 21:40Well, and if I can make a 21:41prediction based off of what 21:42Hillery, you just said. 21:44Um, and, and kind of speaking to Shobhit, what 21:46you mentioned about, you know, small LLMs are 21:49increasingly being able to do more things. 21:51You know, I, my prediction is that 21:54most of the models for, uh, Llama 21:564 that were released, they're very, 21:57even the smallest one is quite big. 21:59You know, a hundred billion parameters. 22:00I think they're going to be used most 22:02by the community to fine tune some 22:04of the older, smaller Llama 3 models. 22:06So if we look at what can run on a laptop, 22:08what you can easily train and customize, 22:11you're really talking, you know, like, uh, 22:13one to 10 billion parameters in size, uh, 22:16more and you know, maybe a dense architecture. 22:19'cause there's a lot of tuning support for that 22:20kind of capability, uh, model already created. 22:24So. 22:24I think that some of the most immediate uses of 22:27these biggest models are going to be to continue 22:29on that trend of how do we get those smaller 22:32models even more performant, uh, by using those 22:34bigger models to be able to teach, to be able 22:36to generate data, to be able to help augment 22:38existing enterprise data and create more of 22:40it, and them bring that and pack that down into 22:43smaller models like the older generations of 22:45Llama our generation of Granite, um, 22:48all playing in that, you know, single 22:49digit billion parameter size frame. 22:51I, I, I totally agree, Kate. 22:52And I think one other, you know, little 22:54factoid, I'm sure you guys have talked 22:55about this before, but it's estimated 22:57that only about 1% of enterprise data 22:59or 1% of the things in enterprise needs 23:00and model to use are contained in 23:03publicly available models, right? 23:04So as you think about that, it has 23:06to be that, um, an enterprise is 23:09gonna be customizing something. 23:10And then the question is what is that something? 23:11And is that something 23:12affordable enough then to scale? 23:14Yeah. And uh, the size, uh, and both the 23:16size of the model, but also the 23:18context of Windows side, right? 23:1910 million per context window. 23:21What a world we live in, right? 23:22I can just dump a bunch of data 23:23to it and, and talk against it. 23:25But it takes a lot to host these models. 23:28So a lot of, uh, use, uh, different 23:31vendors who are offering inference, 23:33infrastructure, the same exact model 23:35it is complex to host this and get it right. 23:38Each vendor is offering different 23:40kinds of context windows. 23:41'cause not everybody can pull off a 23:4210 million infrastructure the way you 23:44fine tune it, so and so forth, right? 23:46Even companies that do third party analysis, uh, 23:49like artificial analysis and stuff like that. 23:51It took them a few turns to get the models 23:54to be provided, the inference infrastructure 23:58just right to be able to match what Llama 24:00had claimed to, to be the, the results 24:03in their papers and stuff like that. 24:04So it takes a few rounds to get this done, 24:06and I believe that this is, speaks to the 24:08complexity of some of these larger models 24:10on how much difference you see from the same 24:12prompt being sent to three different or seven 24:15different vendors who are hosting this model 24:17have slightly different responses and you see 24:19quite a bit of a difference between the two. 24:21So I think we'll get to a point where 24:23derivatives of Llama 4, uh, the data that's 24:27created synthetic data out of Llama 4 and 24:30some of the new techniques that they released 24:31will make their ways into smaller models. 24:33And those are the ones that'll scale, 24:34uh, across, uh, different companies. 24:36But I'm generally very, very 24:37excited of these, these big releases 24:40that model companies are doing. 24:42They're still sticking to their 24:44open weight models, there's still the 24:46restrictions that come with a Meta license 24:48that's not quite Apache and MIT, but 24:50overall our clients have, have, have loved 24:53the fact that we can now outcompete each 24:55other in the AI space and all clients win. 24:59When you have great AI labs 25:00working on this together. 25:06I'm gonna move us on to our next topic, 25:07which is Google Cloud Next, uh, show ba 25:10you're actually dialing in, uh, straight 25:12from Vegas, so I'll kick it over to you. 25:14Um, you've been there all week. 25:16Uh, what are the big things that we should 25:17know about coming out of this, uh, this show? 25:20It's, it's lovely to be with developers 25:22and just people who are hacking through, 25:24and clients who are actually using it. 25:26Uh, 500. 25:28Customer logos on screen. 25:30That's where Google Cloud is today. 25:32Like that's such a great testament to 25:34where they were two, three years back 25:36and they've done quite a bit to make 25:37sure that they're serving the enterprises 25:39and they have more and more data. 25:40Cloud is growing, profitable, 25:42things of that nature. 25:43When you start to look at, uh, 25:44how they're bringing AI across 25:46the entire platform, how they are. 25:48Exposing some of their internal strengths. 25:50So as a, as a great example, they have 25:53amazing TPUs to train their own models for 25:55their own use cases like YouTube, so Gemini 25:58across mobile apps and whatnot, right? 26:00So they're, they're bringing that 26:01TPU out to enterprises and they 26:04constantly innovating on that. 26:05So the latest release, Ironwood, amazing 26:08progress they've made on their own chips. 26:10Then there's a lot of stuff that Google does 26:12in turn be to support their billions of users. 26:15So things like their own wide 26:16area network of, of fiber. 26:18It's millions of miles of fiber that 26:20they've now exposed to or to, uh, 26:22enterprise, uh, users and stuff. 26:23So this seem, they're seeming to make a 26:25very concerted, uh, effort in making sure 26:28that their secret sauce is now available 26:30to the end enterprises to use as well. 26:33Uh, overall, they, uh, they spent a lot 26:37of time on media creation, uh, versus, 26:41uh, use cases like coding or data and 26:43things of that nature, the media creation. 26:46Clearly they're the only cloud that 26:47can do this end to end across all these 26:49different modalities, creating content. 26:51Uh, I was privileged to be part of the 26:52sphere experience in the on Day Zero where 26:55they showed us the Wizard of Awes and what 26:57they're doing to do this on such a mega scale. 26:59Right? It is just. 27:00It's, it's a great experience to see 27:03AI leveraging the, the best techniques 27:05to go create a such a immersive 27:07experience on this big sphere, uh, scope. 27:09So a lot in the media space, but 27:11not a lot of our enterprise clients 27:13jump up and down on the media topic. 27:16There's marketing great, there's some media 27:18creation, but the bigger focus on enterprises 27:20are what do I do with the call center? 27:21What do I do in my code development processes? 27:24My data is, is messy and things of that nature. 27:26So they made. 27:27Quite a bit of, uh, announcements in this space. 27:30They have been for the last few weeks 27:32announcing newer and newer models. 27:33It's just amazing to see how 10 days 27:36before your annual event, you're 27:37releasing your Gemini 2.5, right? 27:39This is, it's this great people hold 27:41onto these big announcements, but in 27:43this AI race, you can't wait for 10 days. 27:46You need to get Gemini 2.5 27:47out before Llama 4 comes in. 27:49So it's, it's good to see that 27:51progress is, is, uh, going really fast. 27:53The performance per intelligence per dollar. 27:55Gemini Flash has been doing really, really well. 27:59Do talk. 28:01Their Gemini 2.5 Pro model across the board 28:04on the benchmarks and on all the different 28:06things that matter, including the loss 28:08exam for humanity is absolutely number one. 28:10So a huge focus on that. 28:12Uh, just shifting a little bit 28:13more towards the agents space. 28:16Uh, we had MCP from Anthropic, which 28:18allows an LLM to in a structured way 28:21with a, with a standard protocol, access 28:23backend systems and stuff like that. 28:25To compliment that Google has created 28:27its own agent to agent protocol, which 28:30allows one agent to talk to the other 28:32agent, not as a tool, but as a, as a 28:34equal citizen, like equal little citizen. 28:36It's a peer. 28:36So both of them can peer and they can talk to 28:38each other and say, Hey, I found this error. 28:40How do I what? 28:41Do what? 28:41Do you want me to do this? 28:42Or maybe go talk to a human if needed. 28:44And this is asynchronous. 28:45It takes a while. 28:46It can take long working task and 28:48they can talk to them back and forth. 28:50I'm generally very pumped when we 28:51get to a point where people start. 28:53Collecting around specific standards. 28:56Uh, Google had a lot of different partners, 28:5850 plus already working on, on, uh, 29:00agent to agent within IBM consulting. 29:02We obviously have a really 29:03good agent tech workflow. 29:04We have our own IBM Consulting Advantage 29:07we already have MCP integrated into it. 29:09Now we are working on agent to 29:10agent within that space as well. 29:12So we are getting really, really excited 29:13about, uh, making sure that this is very 29:15open ecosystem and you're working sideways. 29:18Uh, those were my highlights 29:19from the Google event. 29:20Just very pumped about the. 29:22The clients talking about the 29:24specifics of how they did it. 29:25It's not just a 30 second video, 29:27but a whole half an hour session. 29:28Let's deep dive, here are the 29:30challenges, here's our journey of 29:31which models we use and so and forth. 29:33So it's very good to work with the product 29:34teams and the customers in these events. 29:36That's great. 29:36Yeah. 29:37So, uh, I guess Hillery an avalanche 29:39of announcements here from a 29:40number of different directions. 29:42Um, I'm curious, I, I think as you kind of 29:43like look at Google Cloud and what they're 29:45announcing, trends, thoughts, hot takes, uh, 29:48from, from the Sears, uh, Google Cloud Next. 29:50Yeah. One of the things that caught my eye that 29:52Shobhit didn't have on his list, so I can can 29:54grab onto it and mention it, um, you missed one. 29:58Yeah. 29:58So, so they also talked about, uh, AI on 30:00premises and, and offering those capabilities. 30:03And I think that's also exciting to see 30:05in the sense of, again, it affirms kind 30:07of what we've been thinking here, that 30:09clients do need to be able to run AI. 30:11In an air gap environment, we keep saying that 30:13AI is a platform conversation and that AI and 30:16hybrid cloud are two sides of the same coin. 30:18And really that's a statement 30:20going back to everything. 30:21We were talking at the beginning, that 30:22there is data in really important places 30:24and that data needs to be secured. 30:27Sometimes it needs to adhere to sovereignty 30:29concerns and other things like that. 30:31And so. 30:32Bringing AI to the data. 30:33Um, and the fact that, you know, one of their 30:35announcements this week affirm that is, is 30:37something that they also see as important. 30:39I think it is just a really good affirmation 30:40of what we're also seeing in the enterprise 30:42space that gotta bring the AI to the data. 30:45AI is a, is a decision about how flexibly 30:47you can deploy AI and all those locations 30:50that you have, data and customers. 30:52Um, it's not just a decision about. 30:54Only which model and only 30:56which location it runs in 30:57any final takes. 30:58Kate, I dunno if you have any thoughts 30:59from, uh, this year's Google Cloud 31:00Next uh, on any and all of this. 31:03I mean, it's just like remarkable 31:04every time like Shobhit comes to a show 31:05and is like, here's what's happening. 31:07And it just feels like, it's like this, like 31:08voluminous list that I have trouble parsing, but 31:10I know I need aho it to die to decompose. 31:13Uh, yeah. 31:14All of these main tech 31:15conferences going on, it's great. 31:17No, I mean, o obviously from, you know, my 31:19perspective, I'm most interested in things 31:20like the Gemini 2.5 31:23Pro release, which has been really 31:24impressive, honestly, getting great, 31:26great vibe checks from that model. 31:28Um, really exciting to see them 31:29really kind of take center stage, uh, 31:31and have a, a strong release there. 31:33So, you know, more, more great 31:35models out there only, uh, improves 31:38what the field can can accomplish. 31:39So, uh, from that perspective, 31:41really excited to see them push the. 31:42Push the boundaries. 31:43Yeah, I think, uh, just 31:44one last parting thought. 31:45I think Google is really flexing. 31:48Its, uh, B2C learnings, right? 31:50The fact that they can train their models from 31:52so much content, and again, I'm not getting 31:54into where the content is coming from and 31:56like, and indemnification and stuff of that. 31:58I'm just purely commenting on the fact that 32:00they can train on so much more real world. 32:03Information from B2C, uh, space, right? 32:05There's nobody else who has 32:06access to so much data B2C, right? 32:09So the video generation, for example, the videos 32:11that they are creating are very, very cinematic 32:14and they, it seems like they have really gone 32:16out and looked at all of the YouTube videos 32:18from really good creators and stuff like that. 32:20So the quality is, is, is really good, and 32:22it's translating into voice experience. 32:25And this is becoming more and more critical for 32:27clients to get to vendors, to get voice right. 32:30And I think they have an unfair advantage 32:32in the space where there they can go and 32:35provide some very nice audio experiences 32:37as you're, as you're thinking through. 32:39So one small example was if I have some 32:41Google docs and stuff like that, I can, 32:43I can ask an agent to say, create, uh, a 32:46particular workflow, do some research, and 32:48then create a very long research paper. 32:50So now it's created a three page. 32:52Paper on a particular topic on why 32:54your margins are dropping, though your 32:57revenues are going up and it'll do 32:58competitive analysis and all this stuff. 33:00Create a three page paper. 33:01I can click a button and create an 33:03audio, uh, broad, uh, podcast out of it. 33:05Right? And this like corporate enterprise stuff 33:09that's so difficult to consume and now 33:11you're plugging in a really nice audio, 33:14uh, layer on top of it and I can listen 33:15to it on my drive to to work, right? 33:18I think the fact they have an unfair advantage 33:20on the audio and the experience side. 33:22That starts to give them some advantages 33:24on the enterprise side as well that some of 33:25the other, uh, peers of theirs don't have 33:27with these podcasts going up on YouTube, 33:29maybe Kate, you'll get the digital twin 33:31of Shobhit that you've been wishing for. 33:32Exactly. 33:34As long as he gets some royalties from it. 33:36Yeah, that's right. 33:38Exactly. 33:38There's ad dollars. 33:39There. So, um, yeah, I mean, I think the future of like 33:42educational entertainment here is really funny 33:44and interesting to think about is like, convert 33:47all my emails of the day into a Netflix series. 33:49I can just watch when I get home. 33:50You know? I think we will start to enter 33:52this like very strange worlds. 33:53But here's the kicker 33:54man, and I'll, I'll absolutely close on this. 33:56I wanna live in a world where 33:58I can insert myself show bit inside of 34:02a movie scene that I'm seeing, right? 34:04If Iron Man comes to a bar and orders a 34:06drink, I wanna be the bartender, right? 34:09If, if you have, like, if you have all the 34:11celebrities on screen, I wanna be part of that. 34:13I could be the driver of like, I want 34:15to immerse myself as part of the video, 34:17and this was not possible till today. 34:19So if you look at how far we have come with the 34:21video creation, I think we're at a point we'll 34:22have super personalized movies where they'll be 34:25cracking jokes that I do on my daily basis too. 34:32I'm gonna move us on to 34:33our final topic of the day. 34:34Uh, I'd be remiss to mention this even though 34:36we just have a few minutes on the episode today. 34:38Um, I really encourage you, if you're 34:40listening to the show, to check out this super 34:42interesting report that came outta Pew Research. 34:44Essentially it's a, uh, survey of 34:46American perceptions around AI and how 34:49people use AI in their everyday lives. 34:51Um, and I think we only have enough time to 34:53kind of do a few kind of hot takes here, but I 34:55think one sort of really interesting takeaway. 34:57From this report was the degree to which 35:00sort of experts in AI have views about AI 35:03that are really, really divergent from, you 35:05know, people who are just kind of like using 35:07or experiencing AI in their like everyday 35:08lives or even just having heard about 35:10it and never used the technology at all. 35:12Um, and I think maybe, Kate I'll kick it to you. 35:15I think like one of the really interesting 35:16results was, you know, all these data, 35:18all, all these kind of data points about. 35:20Experts saying, uh, that, you know, uh, jobs 35:24won't be impacted by AI, but people really 35:26feeling like jobs will be impacted by AI. 35:28Um, experts generally being a lot more positive 35:30on the technology than the journal public is. 35:32Do you feel like this kind of impacts the 35:34kind of prospects for AI going forwards? 35:36Um, just kind of curious about your 35:37quick take in the minutes that we have. 35:39Yeah. 35:39You know, I think there's a lot of 35:40interesting things from the, the Pew report. 35:42Definitely not enough to get 35:44fully into right now, but. 35:46I think it speaks to the optimism of the 35:48researchers involved, which is great because 35:50we need people optimistic about the impact of 35:53technology and science on the world to be the 35:55ones inventing and trying to push it forward. 35:57But I also think it speaks a bit what I saw 36:00to some of the representation in technology 36:03in that we still have work to do to get 36:05better representation to more reflect 36:07the world building this technology. 36:09So if you also look at, they broke down, 36:12you know, men versus women's perception 36:13of technology and how it will impact, 36:16it's similarly men matched AI experts. 36:18And it will be no surprise to anyone that 36:21most of the AI experts and the research 36:23field is still predominantly, you know, 36:25uh, the work is being done by men. 36:27So. 36:27I think there's also, you know, it reflects 36:30just some of the needed diversity and 36:31different opinions and broader perspectives 36:33that we still have room to grow and bring 36:35into AI research as a discipline as a whole. 36:38I think it's a great note to end on and 36:39I think, um, hopefully it was a good sell 36:41for you all to go check out the report. 36:43I think there's a lot of data there 36:44and I think it's worth really parsing 36:45through and I, I agree with you. 36:47I think it really points out, so the need for, 36:48for greater efforts on diversity in the space. 36:51As per usual. 36:52I say this every episode now I feel 36:54it's like almost a tradition like saying 36:55agent, uh, in every single episode, 36:57but we have had more things to cover 36:59than we have had time to cover today. 37:01Uh, but uh, Shobhit, Kate, Hillery, 37:03thanks for ablely guiding us 37:05through, uh, for our 50th episode. 37:07And, uh, thanks for joining us. 37:09Uh, if you enjoyed what you heard, 37:10uh, you can get us on Apple Podcasts, 37:12Spotify, and podcast platforms everywhere. 37:14And we will see you next 37:15week on Mixture of Experts.