Learning Library

← Back to Library

Voice Rights, Transparency Index, Watsonx

Key Points

  • The show opens with Tim Hwang introducing three major AI topics: the Scarlett Johansson‑OpenAI “Sky Voice” controversy, Stanford’s new Foundation Model Transparency Index (FMTI), and IBM’s latest Watsonx announcements highlighting enterprise AI and open‑source trends.
  • Panelists Marina Danilevsky, Kate Soule, and Armand Ruiz discuss the ethics and legal implications of OpenAI’s use of a voice eerily similar to Johansson’s after she declined to license her voice, questioning consent, likeness rights, and the broader impact on AI product design.
  • The conversation shifts to Stanford’s Center for Research on Foundation Models releasing the updated FMTI, explaining how the index aims to evaluate transparency, accountability, and potential risks of large foundation models for researchers and regulators.
  • Finally, the team examines IBM’s aggressive push of Watsonx during the AI announcement season, exploring how the platform’s open‑source strategy and enterprise‑focused tools could shape the future of AI adoption in business environments.

Sections

Full Transcript

# Voice Rights, Transparency Index, Watsonx **Source:** [https://www.youtube.com/watch?v=F0FHMakREDM](https://www.youtube.com/watch?v=F0FHMakREDM) **Duration:** 00:38:50 ## Summary - The show opens with Tim Hwang introducing three major AI topics: the Scarlett Johansson‑OpenAI “Sky Voice” controversy, Stanford’s new Foundation Model Transparency Index (FMTI), and IBM’s latest Watsonx announcements highlighting enterprise AI and open‑source trends. - Panelists Marina Danilevsky, Kate Soule, and Armand Ruiz discuss the ethics and legal implications of OpenAI’s use of a voice eerily similar to Johansson’s after she declined to license her voice, questioning consent, likeness rights, and the broader impact on AI product design. - The conversation shifts to Stanford’s Center for Research on Foundation Models releasing the updated FMTI, explaining how the index aims to evaluate transparency, accountability, and potential risks of large foundation models for researchers and regulators. - Finally, the team examines IBM’s aggressive push of Watsonx during the AI announcement season, exploring how the platform’s open‑source strategy and enterprise‑focused tools could shape the future of AI adoption in business environments. ## Sections - [00:00:00](https://www.youtube.com/watch?v=F0FHMakREDM&t=0s) **AI Ethics, Transparency, Enterprise Trends** - The episode preview introduces debates on the Scarlett Johansson‑OpenAI voice controversy, explains Stanford’s latest Foundation Model Transparency Index, and examines IBM’s watsonx announcements shaping the future of enterprise AI. - [00:03:07](https://www.youtube.com/watch?v=F0FHMakREDM&t=187s) **Debating “Her” Legacy and Voice AI Ethics** - Panelists discuss why the film *Her* remains compelling while debating the ethical and regulatory challenges of modern voice‑cloning technology exemplified by the Scarlett Johansson controversy. - [00:06:18](https://www.youtube.com/watch?v=F0FHMakREDM&t=378s) **Choosing Appropriate Voice for AI** - The speakers argue that, like visualizations must transparently reflect data uncertainty, AI assistants should convey their nature through thoughtfully selected vocal styles—criticizing overly flirtatious human voices and emphasizing cultural priors that favor clearer, more robotic or transparent tones. - [00:09:28](https://www.youtube.com/watch?v=F0FHMakREDM&t=568s) **Risks of AI as Personal Counselors** - The speakers caution that using fluent, voice‑cloned language models for informal therapy or companionship raises trust and ethical concerns, urging providers to embed caveats, responsibility, and safeguards against such misuse. - [00:12:37](https://www.youtube.com/watch?v=F0FHMakREDM&t=757s) **Discussing the Foundation Model Transparency Index** - Tim introduces Stanford’s annual FMTI and asks Kate to explain the index’s structure and IBM’s involvement in assessing model transparency. - [00:15:39](https://www.youtube.com/watch?v=F0FHMakREDM&t=939s) **Transparency Index for AI Governance** - The speaker questions the secretive nature of AI model development and proposes an index to incentivize openness, asking the panel whether initiatives like FMTI are effectively promoting industry transparency and how far such efforts can realistically go. - [00:18:51](https://www.youtube.com/watch?v=F0FHMakREDM&t=1131s) **Transparency as AI Differentiator** - The speakers note that as benchmark scores plateau, firms are turning to model transparency, trustworthiness, and governance as new competitive advantages, a trend emphasized in recent IBM client discussions. - [00:21:57](https://www.youtube.com/watch?v=F0FHMakREDM&t=1317s) **Commercial Pressures on FMTI Adoption** - The speaker expresses concern that growing industry reliance on the extensive FMTI index may lead buyers to push for simpler, narrower criteria, undermining its broad transparency goals. - [00:25:04](https://www.youtube.com/watch?v=F0FHMakREDM&t=1504s) **IBM Think Week AI Announcements** - Armand provides a rapid overview of IBM's latest AI platform highlights from Think Week, covering open‑source Granite models, the Instruct Lab customization suite, and a new partnership with Mistral. - [00:28:07](https://www.youtube.com/watch?v=F0FHMakREDM&t=1687s) **IBM Governance & Open‑Source Granite** - The speaker describes IBM’s Watsonx governance integration with AWS SageMaker’s MLOps platform for regulatory compliance and risk management, then announces the open‑source release of Granite code models in several sizes. - [00:31:11](https://www.youtube.com/watch?v=F0FHMakREDM&t=1871s) **Open‑Source Limits for Massive AI Models** - The dialogue debates whether rising pre‑training expenses will eventually prevent companies from open‑sourcing next‑generation models, with one participant challenging the notion that open‑source has a ceiling. - [00:34:21](https://www.youtube.com/watch?v=F0FHMakREDM&t=2061s) **Fine‑Tuning Takes the Spotlight** - Tim explains that the AI race is shifting from massive pre‑training dominance to fine‑tuning and alignment, turning previously low‑prestige work into the new high‑value expertise. - [00:37:26](https://www.youtube.com/watch?v=F0FHMakREDM&t=2246s) **Open-Source LLM Community Enhancements** - Armand Ruiz explains how contributors swiftly extended LLaMA 3’s context window—an often‑overlooked but powerful open‑source innovation—while the hosts wrap up by inviting listeners to suggest future discussions on AI agents. ## Full Transcript
0:08Tim Hwang: Hello and welcome to Mixture of Experts. 0:11I'm your host Tim Hwang. 0:13Each week, Mixture of Experts brings together a brilliant team 0:16of researchers, product experts, engineers, and more working at the 0:19cutting edge of artificial intelligence. 0:21We debate, distill, and discuss down the biggest news of the week in AI, 0:26from product announcements and the hottest papers on archive to industry 0:29gossip and NVIDIA stock price. 0:32This week, three stories. 0:33First up, Scarlett Johansson versus OpenAI, the Sky Voice Controversy. 0:38Who's right, who's wrong, and what does it tell us about where things are 0:40going in the design of AI products? 0:43Second, who's afraid of the FMTI? 0:45The Center for Research on Foundation Models at Stanford 0:48University have released the latest edition of their Foundation 0:51Model Transparency Index, or FMTI. 0:55What is it, and why does it matter? 0:57And then finally, last but not least, it's announcement season, uh, and announcement 1:00season continues with IBM Think hot on the heels of OpenAI and Google. 1:05watsonx is seeing a bunch of major announcements, what to tell us about 1:08the future of AI and enterprise and more specifically about the 1:11future of open source in enterprise. 1:14So the panelists, as always, I'm joined by an S tier level of, uh, 1:19set of panelists for us today. 1:20First off, Marina Danilevsky, a senior research scientist. 1:23Welcome back to the show. 1:25Marina Danilevsky: Happy to be here. 1:27Tim Hwang: Kate Soule, Program Director, Generative AI Research. 1:29Thanks and welcome to the show. 1:31Kate Soule: Great to be here, Tim. 1:33Tim Hwang: And finally, Armand Ruiz, Vice President, Product 1:36Management on the AI Platform. 1:38Armand Ruiz: Thank you so much. 1:39Hi, everybody. 1:45Tim Hwang: I want to tackle our kind of first story, which was sort of 1:47the hot news of the week, uh, the ScarJo versus OpenAI controversy. 1:52So hot on the heels of GPT 4o 1:54announcements the other week, um, and basically Sam Altman simply tweeting 1:59her, there had already been a lot of major speculation that essentially 2:03the Spike Jonze film from about a decade ago, Her, um, was somehow 2:07weirdly ending up being the template for OpenAI's, uh, product development. 2:11Uh, and all of this kind of took a major turn when Scarlett Johansson 2:15herself, uh, released a public statement saying that OpenAI had 2:19approached her to use her voice. 2:21Uh, and then when she had refused, had proceeded to release a, a similar one, 2:25a, a kind of stunningly similar one. 2:27In fact, so similar that people had been like, that sounds like 2:29Scarlett Johansson, uh, when OpenAI was, uh, demoing, uh, GPT 4o. 2:34um, uh, the other week. 2:37And so I think the main question, you know, I think we can get into the who's 2:40right, who's wrong here, but I think the kind of first question that I want, 2:43wanted the kind of panel to opine on is. 2:46You know, the unbelievable thing for me is like, Her is like a movie that's 2:49like a decade old and that like, Sarah Johansson is still like the cultural 2:53template for like the kind of assistive technologies that people are working 2:57on today to the point that like one of the leading companies in the space 3:00almost explicitly is like really still using that movie as kind of like a 3:04template for their product development. 3:07And I guess I'm kind of curious if any of you have kind of thoughts about like the 3:10persistence of like the vision in Her. 3:13Um, and why it. 3:14still so kind of compelling today, or if actually, you know, you think 3:17it's actually kind of silly that people think that it is so compelling. 3:20Um, uh, I don't know if any, you have any thoughts. 3:23I mean, K Armand, you're new to the show, but I don't know if you would want to 3:25jump in first with some thoughts on that. 3:29Armand Ruiz: Uh, I can, I can start. 3:31I mean, uh, look, From my perspective and, um, my, most of my conversations are 3:36always with, uh, within an enterprise set up, but, uh, everything related to voice 3:43imitation and, uh, which is a technology that has been progressing a lot in the 3:47last few years, uh, is, is a big concern. 3:50Um, and I think that's why we're, we're seeing this acceleration on, on 3:56regulations, because these examples are just freaking people out, honestly. 4:02Right. And. 4:03And, uh, we, we, we need to be careful, especially when companies 4:07like, uh, OpenAI, that they have so much, um, reach and hype around them. 4:12And, by the way, that demo was spectacular in every single sense. 4:16And it's a little bit sad that all we're talking about is, is this controversy. 4:21Uh, on, on the resemblance, on the voice with Scarlett Johansson, I think we, there 4:26was this opportunity to just pick another voice or make it, uh, less closer to, to 4:31her, given like that they tried actually to, to, um, to get her voice officially in 4:37the, in the system and it didn't work out. 4:40I don't know. 4:40What do you think? 4:41Kate Soule: Yeah. 4:42I mean, I, I think a lot of the draw to having a Scarlett Johansson type voice 4:47is really, you know, an attempt to try and get trust and comfort with these 4:50systems that, you know, that's where I think a lot of the initial ambitions lay. 4:55But if you think about it, like these models are, are tools, 4:59they're not, they're not people, they're not humans, they're tools. 5:02And is this really the right tone? 5:04And even, even, I mean, there are huge issues on data rights to consider, but 5:08even that aside, is this the right tone and mode in which to actually communicate 5:13the value and that these tools can offer? 5:17Tim Hwang: Yeah, for sure. 5:18I think that's kind of one of the funny things. 5:19I mean, to first respond to Armand, I think like, um, you know, everybody 5:22I know who is like more in the machine learning space saw the 5:24demo and they're like, low latency. 5:26It's crazy. 5:27Right. 5:27And then everybody else who kind of saw the demo who are less in 5:29the AI space are like, it's her. 5:30And it's sort of interesting, like what people pick up from demos, 5:33depending on their level of. 5:34Familiar with uh, the technology. 5:36But I guess, Kate, we'd love to kind of go into the point you just made a little 5:39bit more, you know, I think there's a kind of question of like, should we 5:42be imitating Her in the first place? 5:45Like I think Her is kind of such a fascinating movie, cause I, I watched 5:47it again recently, because I've been talking to a bunch of people being like, 5:50oh yeah, it's like a great product vision like And then you watch her and like the 5:54whole point of her is like, this is a bad direction for technology to be going down. 5:58And so it's like very strange to me that like, you know, it's become 6:01a template, uh, in some ways. 6:04Um, and I guess the kind of pick up would, is sort of what you're saying that like 6:07we, we actually might not want technology companies to really kind of, imitate like 6:11a human companion, like that there's some ethical concerns that you have around 6:15that, or, um, or maybe your point is actually maybe in a different direction. 6:18Kate Soule: Well, you know, there's a, a principle in data storytelling and 6:22data visualizations that, you know, the, what you visualize should reflect, 6:26uh, the data and how it was created. 6:28And so if you're uncertain and there's uncertainty, you should visualize error 6:31bars, for example, you know, and I think. 6:34A similar principle applies for large language models, like the mechanism 6:38and the mode of how you communicate the results that the model is saying 6:41and the tone and intonation, everything that you're doing is providing a 6:45lot of information for the user, whether you realize it or not. 6:48And I don't think that human voices should be off the table, but you know, 6:53very flirtatious female human voices for something that's meant to be a 6:57tool and an assistant, you know, is that really the right thing to do. 7:00mode and mechanism. 7:01And I think there's a right way and a wrong way. 7:03And it's, you know, it's sometimes hard to define exactly what correct and wrong 7:07is, but you know, this one seems to lean a little bit too far to the wrong side. 7:12Tim Hwang: Definitely. 7:12And it's sort of interesting because I think like when you get to the realm of 7:14voice, you really are working on like people's, kind of cultural priors, right? 7:18Like, you can imagine a voice which was like very like sci fi robotic. 7:22It's like very arbitrary what voice we wanted to produce. 7:25And like, you know, I guess in that sense, maybe it communicates more 7:27that it is a computer that you're talking with versus a versus a person. 7:31Um, 7:31Kate Soule: and there's like ways to earn trust, right? 7:34In systems and to make people feel more comfortable. 7:36But like, do you, you know, there's also real reasons to have some skepticism 7:41of what you're hearing from models. 7:42And there's, you know, proper ways to go about, you know, showing that 7:47models are not confident and that there are risks and things that should 7:51be evaluated objectively by humans. 7:53And if we're just being told something in a very trusting, loving voice, then, 7:58you know, are we really doing our due diligence here as model providers and 8:01giving our customers the right, you know, putting them in the right mindset of how 8:05to use these models in a responsible way? 8:08Tim Hwang: Totally. 8:08So, Marina, I want to bring you in. 8:09I know you're a veteran to this show, but I think one of the reasons 8:12I was very excited to have you back on was last time you were talking 8:15about Inspector Ragget, right? 8:17And I think the conversation that we had at that point was, well, how 8:19do we know that RAG is doing well? 8:22We need to build kind of like a dashboard experience for people to 8:24kind of monitor and understand whether or not they They should trust the 8:28results coming out of a RAG process. 8:30And I guess as someone who has like worked so deeply with that as a method, right? 8:34Like the dashboard as the way you establish trust versus like 8:37the voice as the interface. 8:38Are you, how do you feel about voice? 8:40Are you like kind of suspicious about it? 8:41Like I get it. 8:42I'm getting suspicious vibes from Kate, but I'm kind of curious about like how 8:46you kind of navigate this as we think about like all the different interfaces 8:48we can have in sort of assessing like model trust essentially, which it 8:52really seems like we're talking about. 8:54Marina Danilevsky: Sure, I will say I don't think that the, the dashboard is the 8:57only way and it tends to be the kind of thing that is, again, more understandable 9:00to model developers, um, normal folks don't understand it and actually often I 9:05think it's a way to have even less trust because if you have somebody who's not 9:09technical, they're going to look at it and be like, what am I supposed to do with all 9:11these numbers, all of this, all of that? 9:13Like, tell me sort of at the end of the day. 9:14So I really, really agree with what it. 9:16Kate is saying, which is that it's important in how you deliver the 9:19information at the end, finally, to the end user, you should give on in 9:23terms that are obvious to the person receiving it, whether you should 9:26trust and how you should take it. 9:28So like one direction that's a little bit worrying is the amount of, uh, people 9:33using these language models, for example, as ad hoc psychologists or ad hoc friends 9:38or girlfriends or anything of that kind. 9:40So now we're going to make sure that now we have that in 9:42Scarlett Johansson's voice. 9:43That seems again, not maybe. 9:45The right direction to go. 9:46Let's just Tim Hwang: pour some gasoline 9:47Marina Danilevsky: on this. 9:49And certainly not in the enterprise setting. 9:51So voice is great, uh, but it should also be a way to communicate, just as in text, 9:58are, you know, to what extent should I be really trusting what you say and can 10:01you give me the appropriate caveats? 10:03There's a responsibility here. 10:04Just as when you read something that reads extremely fluent, you hear 10:08something and it's extremely fluent. 10:09It's got that affect and it sounds human. 10:11Of course, you're going to have a tendency. 10:13To, to take it in a particular way. 10:15I think there's a lot of responsibility on the people providing these models 10:19to, to, to do that accordingly. 10:21Tim Hwang: Yeah. I never really, oh, yeah, Armand, go 10:23Armand Ruiz: ahead. Tim Hwang: Yeah, 10:23Armand Ruiz: I just wanna add, uh, two quick things. 10:25One, I think, um, maybe it's a little bit controversial, 10:29but I'm gonna say it anyway. 10:30Please do. Uh. 10:32Here we are talking about it, right? 10:33So I think Sam Altman is like Elon Musk, uh, they, they know very well, they are 10:38very smart how to market their products. 10:41And, and they had the Google conference right after this, their, their event. 10:46And they always find ways to be on the headlines. 10:50One way or the other. 10:51So I think that they like a little bit of the controversy. 10:54I think maybe this one is getting a little bit out of control, but it's 10:57not the first one that they faced. 10:59Um, and on the other hand, I think there is also, uh, about voice. 11:03I think voice is, it's been the promise for AI for many years with Siri, 11:07with Alexa, but it was low latency. 11:10It was, It's very robotic. 11:12So that demo and the Google demo, and there was a similar demo a few years 11:16ago, very researchy from Google that was showing already like a more natural voice. 11:23And at some point it's going to always be a problem. 11:27Any voice they put is going to resemble to someone else's voice. 11:30So this is a very difficult conversation in this case is because we're talking 11:35about a celebrity and, and voice cloning from celebrities is going to be a problem. 11:41But we will always have these problems that these voices will 11:43resemble, uh, someone else. 11:45Marina Danilevsky: Actually, I wanted to respond to something that 11:47you said, um, I don't like Elon Musk and Sam Altman speaking for 11:53all of us that are working in AI. 11:54They like controversy. 11:56They, they're kind of, they're very, you know, bro kind of guys. 12:00Okay, great. 12:01But this assumption that, you know, all publicity is good publicity and as long 12:04as you're talking about me, that's great. 12:07It's not. 12:08Reflective, I think, a lot of us that are here, and also the idea 12:11of, well, why wouldn't Scarlett Johansson agree to be the voice? 12:14She should be honored that she was asked. 12:15She was in a sci fi movie about this. 12:17So, clearly, this is the same thing. 12:19That level of assumption and that level of, well, it should be an 12:22honor to participate in anything that I do, that leaves a very 12:26bad taste in a lot of our mouths. 12:28And I just want to, say a lot of us are not pleased and do 12:31not, that doesn't represent us. 12:37Tim Hwang: So we'll move on actually, because we have three topics to get to. 12:40So I'm going to bring up the second topic and Kate, I'll bring you in 12:43to kind of lead us on this, but just to kind of quickly tee us up. 12:46Um, so this is a big week. 12:48Um, there's a group called the Center for Research on Foundation 12:50Models, uh, at Stanford university. 12:52Um, Uh, Percy Lang and a number of his collaborators there have been working 12:56for some time on something they call the Foundation Model Transparency 13:00Index, or FMTI for short, um, and it effectively is kind of this annual 13:05index they're doing of like leading foundation models evaluating effectively 13:10their commitment to transparency. 13:12And I guess, um, Kay, I figured, you know, just for our listeners, It 13:15worth it kind of talk a little bit about what it is in the first place. 13:18Um, and then I know you were actually working in a, in a pretty 13:20deep way on this just recently. 13:22So we'd love to kind of hear about sort of your involvement and sort 13:24of IBM's involvement in the FMTI. 13:27Kate Soule: Yeah, absolutely. 13:28So Stanford's report in the transparency index is, uh, a compilation of a 13:34hundred different questions that they basically ask model providers to 13:39understand uh, how transparent and open they are across the model life cycle. 13:43So they look at everything from upstream, how is the data curated, what rights do 13:48you have to the data, are you transparent about what data you use, To the model 13:52itself as the second main category. 13:55So have you evaluated your model for different risks? 13:57Do you describe those risks? 13:59Do you provide mitigations for those risks? 14:02And then also to the downstream uses, like, do you, are you clear where the 14:05usage and policies, do you talk about how you would enforce those usage 14:09policies, where are your models being used and, and those types of applications? 14:14So what it does really well, and what I really appreciate about it is it, It's 14:19not trying to evaluate and say, this is an unbiased model or this score. 14:24If you score well, that means your models are safe. 14:27What it's doing is trying to look at how open our model providers 14:31about their own technology. 14:32Are people actually sharing what they've built, sharing the degree to which 14:36they've tested different safety aspects? 14:39Um, and are sharing those with their own customers or not. 14:42And you know, that's something that I I'm really, really passionate about 14:45in the entire team here for IBM that trains granite models have strongly felt 14:50we need to show up very strongly on. 14:52So, uh, our granite models were ranked in this report. 14:55We're really excited. 14:56We came in fourth overall. 14:58Uh, and especially on the upstream, like all of the data collection, uh, 15:03and all the work that we do on the curation and transparency around what 15:07data goes into our models, we were one of the top scoring model providers. 15:11So the, the granite model showed up very well in that report. 15:14And we're really excited, excited by those results. 15:17Tim Hwang: Yeah, congratulations. 15:18I know it was like very competitive actually, like the number of 15:21companies and sort of models that they were covering was like very vast. 15:24Kate Soule: I mean, they cover the top 14 or so model providers. 15:28This is the second time they did the report. 15:30The first time was back in October and they looked at the top eight. 15:34Uh, or so, and, um, it's, it's a really, really exciting area. 15:39Tim Hwang: Yeah, for sure. 15:39So, I think there's a bunch of interesting questions I want to kind 15:42of talk to the panel about, really kind of about sort of like governance in the 15:46universe of AI, because I think there's sort of two very interesting things 15:49going on that I see in FMTI, right? 15:51Like, I think one of them is, uh, You know, even a few years ago, and 15:54it still kind of is this way, right? 15:56Like, I think, like, a lot of the process of, you know, pre training, 15:59fine tuning models is, like, shrouded in mystery, where people are like, oh, 16:02well, you know, I heard they have, like, this thing that they do in the recipe 16:06that really gets these great results. 16:08And so, like, a lot of the way AI development has proceeded in 16:11the past has been very secretive. 16:12It's been the realm of, like, trade secrets. 16:14Um, and so I think like one of the interesting things here is like, can 16:18we create an index that kind of creates sort of like a race to the top, right? 16:21Like avoid a world in which everybody's incredibly closed, uh, about their model. 16:26And yeah. 16:27You know, I guess I'm, I'm curious, you know, uh, Marina, like, uh, as 16:30a researcher in the space, you know, do you feel like it's working, right? 16:35Like the strategy, like, do you think FMTI is like helping to improve 16:38or like encourage companies to be more transparent in the space? 16:41Um, and, and if so, I'm kind of curious about like how far you think it will go? 16:44Because presumably at some point all companies will be like, well, we're never, 16:48can't definitely tell you about that. 16:49Right. 16:49And so I think we're kind of playing with this line about like, what 16:52do companies owe to the public when they release these models? 16:55And, um, yeah, I was just kind of curious as someone who's kind of 16:56like a researcher researcher in the trenches thinking about this, um, how 16:59you sort of see this type of effort. 17:01Marina Danilevsky: Sure. 17:02So I'll start again with agreeing with Kate that this at least 17:04encourages people because they see, oh, other companies are saying stuff. 17:08So it's maybe okay or good for PR or for adoption reasons for me to also say stuff. 17:14First, everybody had to get to a certain point of So when nobody could 17:18figure out how to get the models to a certain point of quality, no one was 17:20going to say anything, just in case they came up with the right secret 17:23sauce and I'm not going to share. 17:25As people, I think, start to get the technology to be a little bit more, 17:29uh, evolved, a little bit more mature, okay, now there are reasons, including 17:34economic ones, of why you'd want to share, because that'll be the kind of 17:37thing that your clients or customers will, you know, pick you over somebody 17:42else because of aspects of this. 17:43So this kind of. 17:44public, uh, pressure of, well, they did this, so I'm going to do this. 17:47This pit is actually really good. 17:49Um, there's going to be a limit to how far it's going to go. 17:52Of course, nobody's going to share, uh, like customer data or also anything that 17:57might get them bad PR or anything that might get them potentially in trouble. 18:01Uh, legal waters or anything of the kind. 18:03Sure, right, yeah, yeah. People won't share. 18:05But overall, it's a good trend. 18:06It speaks to the, um, evolving maturity of the field to me. 18:11So that's, that's, that's how I see sort of that back and forth going. 18:14Kate Soule: I mean, you can see it in the scores too. 18:16Like the scores from October to, uh, this, this year. 18:19Latest report may have gone up across everyone who was evaluated. 18:22Like I think there's this like safety in numbers where also given the regulations 18:27are still evolving and everyone, you know, a lot of case laws still evolving, 18:30people are kind of testing the waters, but as more and more, uh, results are 18:34shared and, and people are more and more transparent, it gives confidence 18:38for, for more people to do the same. 18:39Tim Hwang: It does sort of feel like, like things like transparency, right, 18:42or like things like, you know, chatbot arena, they're in some ways kind of 18:45like of a piece, which is that like early on everybody was sort of like 18:48competing against these like benchmarks. 18:51And essentially like as the benchmarks have become like more and more saturated, 18:54now it seems like everybody's trying to differentiate in different ways. 18:57And like one of the differentiations is like, is your model transparent 18:59or not, right, which is almost like a factor that sits, you know, 19:02somewhat on top of the model, but also like outside of it as well. 19:05Marina Danilevsky: I was just going to say, if you actually look at the 19:06places where the scores are still low, Um, I think, uh, off the top 19:10of my head, I know it's about like evaluation of model trustworthiness. 19:13How do you do it? 19:14And then also downstream applications. 19:16These are things that we're not yet very good at, do not understand yet very well. 19:19So those are the cards that people are still kind of holding a little 19:21closer to their chest in case again, this turns into a differentiator. 19:24So again, it just makes the point of lots of the scores are improved. 19:26It's very interesting to see the ones that haven't, because it again, gives a sense 19:29of, you know, what is the confidence and the maturity of, uh, of the technology. 19:34Armand Ruiz: Yeah, for sure. 19:36I'll add, uh, this, uh, in the conference IBM think this week. 19:40I maybe talk to 50 plus customers um, governance, transparency, trust. 19:46Uh, it was in every single discussion. 19:49Um, and over the last year, Granite and the work that we're doing at IBM. 19:54Uh, is really a differentiator. 19:57Our customers really appreciate that. 19:59Um, and in fact, if you follow me on LinkedIn, I'm extremely open and I 20:03published a few times the research paper from Granite, which I recommend everyone 20:07to go check because it explains very well where they went to train the model and 20:13the data collection process, the data preprocessing, we've been extremely open. 20:17You won't find a paper with such openness on what he went to train the model. 20:22Um, so that is actually. 20:24becoming more and more important because companies, they, they 20:27take, they take these models as a base model, and then they mix it 20:31up with their own enterprise data. 20:33So you need to get a very good base model that you can trust, uh, if 20:38you are going to mix it up with your own data to get, uh, outcomes. 20:41Yeah, Armand, 20:42Tim Hwang: you're actually like, you were, uh, you're anticipating my question 20:45because I think one of the things, uh, I have, I think is as a, you know, 20:48point of point of critique as well. 20:49Okay. 20:50There's a bunch of academics at Stanford, right? 20:52Like, is this actually impacting how business is behaving? 20:55It sounds like your answer is yes. 20:56Like, actually, it turns out that, like, companies are there looking at 21:00the, you know, the FMTI being like, well, this actually is relevant to 21:04my purchasing decision, which is really, really pretty interesting. 21:07Kate Soule: Well, just to build on your point a little bit, Armand, 21:09it's like, you know, it's not like you're just taking these models 21:12and then adding a layer on top. 21:14Like when a model provides a response, you can't pinpoint back that the model is 21:18using your data or it's pre training data. 21:20You know, it all basically, you know, goes into a blender and comes 21:24out mixed together the other end. 21:26So just because you're going through applications with your own data 21:29and using rag patterns or even fine tuning and other things, it doesn't 21:33mean that you have control over all of the history you're inheriting and 21:37baggage and skeletons and closets you could potentially be inheriting 21:41when using some of these models. 21:43Tim Hwang: Yeah, totally. 21:44So Kate, final question. 21:45I'm curious if you have any thoughts on is, you know, kind of almost talking 21:48about the trend is like, where does things like FMTI go into the future? 21:53Um, and I guess I want to kind of relay a fear and maybe you can relay my fear. 21:57Uh, but I don't know if you agree is basically like, you know, I agree. 22:00I think one of the nice things about FMTI, though, having been in a company that 22:04received an FMTI request was like, Oh, you know, it's basically like they have not. 22:08Take it any shortcuts, right? 22:09They're basically like, how do we know whether or not you're transparent? 22:12I don't know. 22:12It depends on how well you do against these a hundred indicators, right? 22:16Which is like a massive It's like a project in order to take on responding 22:20to the FMTI Um, and I guess I'm kind of curious and maybe a little bit worried 22:24about kind of the commercial pressures on indices like FMTI What I mean by 22:28that is You know, if you're a B2B buyer, you're an enterprise like on the 22:32market looking for, you know, a model to use for your internal operations. 22:37A hundred indicators is like a lot, you know, like there's a reason 22:40people go to Wirecutter and they're like, Oh, I will buy the same 22:43fridge that everybody else has. 22:44They're like, I'll buy the same, You know, wire management, everybody else says. 22:48Do you worry at all that, like, as these models, or these indices become 22:51more and more used by industry, like by businesses to make purchasing 22:54decisions, we'll also see kind of, like, pressure to kind of, like, narrow. 22:57Like, people do ultimately just want, like, transparent or not. 23:01Or like, you know, rubber, yeah. 23:02And so, you know. 23:04If you do agree with that, like, I'm kind of curious if you have thoughts on, like, 23:07how do we keep that aperture open, right? 23:08Because it's important for us to, like, keep the kind of transparency 23:11that it sounds like Percy and team is really chasing after here. 23:14So, and a couple of thoughts there, but curious about, 23:16like, how you'd navigate that. 23:17Kate Soule: I mean, I think particularly when regulations start to come into 23:21act, you know, there's going to be tremendous pressure to be able to put 23:24a, you know, a rubber stamp on something and say it's compliant or not compliant. 23:28Um, and there, so there will certainly be pressure along those lines, 23:32but you know, Similar to, it's a similar risk that you bring up that 23:37you have with also gamification. 23:39Like people are going to just start optimizing for a couple of key 23:42things and how do we make sure that we continue to push forward and to 23:46drive how we're, we're innovating. 23:48And I really think it comes down to making sure that we're 23:52continuing to keep pushing forward. 23:52pace on how we define transparency and safety in models with how 23:56fast this technology is growing. 23:58So if you look at how we looked at large language models a year ago and 24:02what was considered state of the art and what was considered safe versus 24:04not safe a year ago compared to today is an entirely different story. 24:08And it's going to need to continue to evolve. 24:10And we need researchers like those at Stanford helping us articulate to what 24:14those risks are coming up with more, uh, nuanced ways as, as some of these 24:19metrics and indices become saturated. 24:21Everyone's always. 24:21You know, sharing all this information, you know, maybe we can shift our focus 24:25into some of these new emerging things that we need to continue to keep in mind. 24:29Tim Hwang: Yeah. 24:29And I think that will be the new game is like the initial task is like 24:32getting the companies to do this. 24:33And then now that sort of the challenge is like, well, we don't want you to game it. 24:37So like the criteria also may also become this kind of game where it's like, well, 24:40we won't tell you what the criteria are for certain types of indicators. 24:43You order to kind of retain the benefit of the, the Sigma. 24:46But I, yeah, I think 24:46Kate Soule: that's to our benefit as a field, right? 24:48Totally. 24:48If, if we don't have some sort of incentive to keep innovating, um, then 24:52you know it's gonna become stagnant. 24:54So, uh, certainly welcome it. 25:00Tim Hwang: Um, so I'll move us on to our last topic here. 25:04And Armand, I'm gonna give you center stage, uh, 'cause it sounds 25:07like you were, had a really busy week presenting all of this. 25:11Um, so, uh, this was, uh, IBM's think week. 25:14Um, it continues the season of announcements. 25:16Everybody's announcing AI. 25:17stuff right now. 25:19And so I guess, um, you know, I think just as an open question, Armand, do 25:21you want to just kind of quickly give a thumbnail sketch of everything? 25:24I know in particular, you were very excited about the 25:26announcements around watsonx. 25:28Um, and if you just want to give like a thumbnail sketch to our listeners about 25:31what was announced, because I read the blog post and I was like, this is going 25:33to be a lot to cover in 15 minutes. 25:36Um, but I think you as an expert would probably be able to put us 25:38best on track on, you know, what people should be paying attention to. 25:41Armand Ruiz: Yeah, there is so much. 25:43Uh, I'm going to be a little bit selfish and talk about the area, 25:46uh, of what's so nice that I cover, which is the AI platform part. 25:49Um, I will start with, with, uh, Granite models that, uh, they are now open source 25:54and I'll let Kate elaborate on that. 25:57So, um, but we're really excited, uh, to, to just jump 26:01into the open source movement. 26:03And then we, we have something called instruct lab that helps customize models. 26:07So, um, I'll let, um, explain, uh, Kate will explain that a lot better. 26:13She's been driving a lot of that from the research angle. 26:16Then, um, we announced a very exciting partner partnership with Mistral as well. 26:21And we already had the Mistral open source models in, in our platform. 26:26And we offer that to our customers. 26:27Now we also have the Mistral commercial models. 26:30That includes Mistral large and Mistral small. 26:32And that is, I mean, we, we, we love Mistral and now we're gonna 26:37be able to offer that to customers in the cloud and on prem as well. 26:41And specifically in Europe, that's gonna be a very big hit. 26:45Um, we are We released a lot of features. 26:48So many features. 26:49One I would like to highlight is, for example, chat with your documents. 26:53The classic RAG use case, we introduced a user interface that is very easy to add 26:58documents or point to a vector database. 27:00You can have thousands of documents there. 27:02And in just a few clicks, you can create your own chat interface to talk to the 27:06documents that will pinpoint directly to the reference and to the citations. 27:10And then you can export that as an application or as an endpoint that you 27:14can integrate with your own applications. 27:16So, um, there are a lot of, uh, tools like that that will make the development 27:21of, uh, solutions, um, very flexible. 27:25The last one I would like to highlight is two more I would like to highlight. 27:29One is a toolkit that we're releasing in tech preview for application 27:34developers to make it extremely easy to develop Gen AI applications. 27:39So we're all the time talking about LLMs, but LLMs don't make 27:42applications and solutions. 27:44LLMs are just one component. 27:46And coming back to the RAG use case, you need an embeddings model. 27:49You need a vector database. 27:50You need a to chunk the data. 27:54So you need a lot of different things. 27:55So we, we believe we were creating one of the best toolkits for developers 28:01to make the development of those use cases extremely simple with a lot 28:05of templates and access to tools. 28:07And the last one on the governance. 28:09On the governance, we also release a lot of stuff. 28:11I'll, I'll highlight, um, What we're doing with AWS, we have a very good 28:16partnership with AWS and SageMaker is a very popular tool for, um, for, 28:22um, enterprise, uh, to just build and deploy machine learning models. 28:27And now you can govern all those models directly with watsonx. 28:30Governance. 28:31That means you can have like your full central panel, MLOps panel, And, and 28:36then you, you have those components of regulatory compliance and risk management 28:40to make sure those models perform. 28:41So those are just a few, but there is, there are so many new 28:45assistance and other features, but maybe let me hand it over to Kate. 28:48You can explain the open source angle on Granite and InstructLab. 28:51Tim Hwang: Yeah. 28:52Do you really mean it? 28:53Before Kate, you jump in. 28:54It's, it does feel like. 28:55It's like open source, like the big message is you're all in on open source. 29:00Yeah, 29:00Kate Soule: it was really exciting to be there. 29:02You know, it was hosted in my hometown this year, which was 29:04really fun, uh, in Boston. 29:07And the message across every single presentation was IBM 29:11is all in on open source. 29:13And so it was really, really exciting to be there and to be part 29:16of the announcement for the Granite code models that we released. 29:19So we open sourced eight state of the art granite code models two variants, um 29:24for four different model sizes a 3b and 8b Uh 34 billion parameter models And 29:31especially the 8 billion parameter model really we're seeing state of the art 29:35performance And its ability to outperform anything else that has come out there 29:40we're really thrilled just to be able to create this as a starting point that 29:44the rest of the community can operate under and A lot Of Armand's mention in 29:49Struck Lab, a lot of our intent behind releasing the Instruct Lab open source 29:53project, which I think you guys covered in an earlier episode as well, is giving 29:57the community, the open source community, the tools to work on models together. 30:01So allowing them to collaborate and contribute to models, uh, and build 30:06ultimately a better model that benefits from the world working together. 30:09And, you know, I think it gets Marina kind of to your earlier point of 30:13like, You know, there's a lot of big personalities who are trying to define 30:18how AI works and the world works. 30:19And that's one version of the world, but that's not how IBM sees it. 30:22And it's really only through, you know, an open source ecosystem where we bring 30:26the best that the community has to offer and everyone working together that I 30:30think we'll really be able to unlock, uh, the, the future potential here. 30:34Tim Hwang: So I'll reveal, uh, reveal a little bit of my own bias here. 30:37Uh, huge open source head. 30:39Thanks Um, struggled for many years running just like Linux, like that 30:44was my, my childhood was like running free and open source software locally. 30:47Um, and I think one of the things, you know, I kind of would love the 30:51three of you to kind of respond to, um, is essentially like this kind of 30:55interesting question that I saw popping up on social media this week, which is 30:58how sustainable is open source, right? 31:02Um, and, you know, I think. 31:04One of the debates, I mean, speaking about big personalities 31:06that want to define AI, right? 31:08So there's a bunch of very loud VCs on AI arguing about this. 31:11But I think that the root of the debate, I think, is an interesting one, right? 31:15Which is that, you know, you look at the pre training costs 31:18of state of the art models. 31:20And as we scale bigger and bigger and bigger, it just gets more and more 31:23and more expensive to like, accumulate the computing clusters you need to 31:26do this, to do the pre training runs. 31:29Um, and I think, you know, I think one of the arguments sort of being made 31:33right now, right, is, is there coming a point where basically like these next 31:38generation models are becoming so, so expensive that like it would be very, 31:43very difficult to imagine any company that originates these models being willing 31:46to open source them going forwards. 31:49So that essentially sort of the argument is that there's a point at which kind 31:51of like the, the sort of like, uh, upside from open sourcing is going to 31:56be outweighed by sort of like the raw pre training costs of these models. 32:00You know, the conclusion goes, right, with these big personalities 32:03on Twitter is basically like, dot, dot, dot, open source has a ceiling. 32:08Kate Soule: Do Tim Hwang: you 32:08Kate Soule: all buy that 32:09Tim Hwang: argument? If not, why not? 32:10Kate Soule: I don't buy that argument. 32:12I mean, I think there is incentive to continue to drive model performance, spend 32:18more and create bigger and bigger models. 32:20But I think we're seeing diminishing returns in terms of, you know, The use 32:24cases and the value, you can accomplish a ton of incredibly, you know, take 32:30care of low hanging fruit, so to speak, with much smaller models that 32:34are going to be the ones that you're actually deploying and using day to day. 32:37Like, that's where I think the economic value is going to drive. 32:39And that's where open source is really, Well positioned. 32:43I also think we're also, you know, we're learning a lot this past year, these past 32:49couple of months in terms of how to unlock value and improve performance in models. 32:54And most of the cost is spent on pre training, right? 32:57As you say, you know, Burning GPUs for months, thousands of 33:00GPUs, uh, to create a base model. 33:02But then there's a step afterwards called alignment. 33:04And that's where the open source community has really been leading the innovation. 33:09They take these models like the llama model series, that's incredibly popular. 33:13And they take that base model and iterate on the alignment step. 33:16And that is far less costly, far less compute intensive. 33:20So we're able to drive, you know, these step changes without having to resort 33:24back to just burning compute hours for, you know, eons driving up crazy costs. 33:28So I, I don't think that's, you know, a valid argument in my mind. 33:32Tim Hwang: Yeah, for sure. 33:33Armin and Marina, do you want to jump in at all? 33:34Or do you largely agree? 33:36Armand Ruiz: Yeah, no, I fully, I fully agree. 33:38And, and, and, uh, we're all in on open source at IBM and I, I, I see the passion 33:47of the community and that's really, really hard to, to compete with, right? 33:52Like, um, even, even companies are using the open source speech 33:56that, um, to attract talent. 33:58Uh, researchers, they, they want to see their work. 34:01I contributed back to the community and not closed source and behind an API 34:06and their work not represented in any different way than just a commercial API. 34:11So I think there is also that angle as well. 34:13The power of the community is proving to attract the best minds 34:17in the planet to progress on AI. 34:21Tim Hwang: Yeah, it's fascinating to believe that, you know, like the 34:23era of big scale is already over. 34:26Like, essentially, like, the competition has already shifted. 34:28Like, it has turned out that, like, scale was not all you needed. 34:30Like, actually, you needed a lot more than, more than scale in some ways. 34:34I think, Kate, what's also really interesting in what you say is I 34:36think a little bit also about, like, you know, in a company using AI, like 34:40in any company, There's this, uh, like hierarchy of prestige, right? 34:43Who's doing the important work? 34:45Who's the rock star? 34:46And for a very long time, pre training was like the rock stars, right? 34:49Like, oh man, they're like really using these like, you know, F1 computers to 34:53like kind of create these like, you know, beings of pure linear algebra. 34:57But like kind of what you're saying is like, actually the future is not that. 34:59Like, in fact, like it's turned out that like what was traditionally 35:02almost low prestige in the machine learning space, which is. 35:05You just do the fine tuning at the end to make it a nice chatbot is actually 35:08where the action is going to be. 35:10Do you buy that? 35:10Like in the future, people are going to be like, Oh my God, that 35:12person's like a God of fine tuning. 35:14Like this person is so amazing at alignment, like they're the ones and I, 35:18this commodity stuff is like pre training. 35:20Kate Soule: Uh, I mean, I think the community like is the quickly 35:23already there if we're not already. 35:26It is insane the amount of innovation that's happening at that part of the 35:29process, and there's just so much untapped potential given, relatively 35:34speaking, how cost effective it is. 35:37So, you know, I think that's where we're also going to continue just to 35:39be incentivized and where rock stars, as you say, will be made because you're 35:43going to do what pre training had to spend millions and billions of dollars 35:47to do and, you know, a fraction of that. 35:52Should I ask the question then? 35:53Tim Hwang: Why keep scaling? 35:55Is it an insane thing for the industry to be doing? 35:57Kate Soule: Okay, well, the hidden curse of why alignment's doing so well is you 36:01need big models to make good small models. 36:04So, you know, there is this paradox here of like, okay, small models are 36:08where we're incentivized, but At some point, if you don't have a big model, 36:12you can't make a good small model. 36:13But we're also seeing a lot of open source, great larger models come out. 36:17There's a bit of a play there that has to still, I think, evolve in terms 36:21of, you know, the market still needs to feel out how that's going to fully 36:24play out, um, for model providers. 36:27Tim Hwang: So this is great. 36:30Uh, any final thoughts, uh, Armand and, uh, Marina? 36:33Marina Danilevsky: Um, I think that, uh, these things go in waves. 36:36So we had a wave of, like, scale, scale, scale in a way that you just could 36:39never do before, and that was amazing. 36:41So it's very natural that we say, all right, we've maybe hit not a plateau, 36:45but it may be a little bit of a slight slowing down in the S curve, all 36:48right, let's see what else we can do. 36:49It's going to come back again, and meanwhile, it's a very reasonable thing to 36:53continue to try to see, well, what can we meanwhile do with, with the hardware, with 36:56the acceleration, with everything else? 36:58Because it's going to come up again for, for some reason or another. 37:01All right. 37:01So it's very good and natural that you go from scaling and now it's like, all 37:05right, how do you get small from big? 37:07It's probably going to go again. 37:08Okay, now what can we turn those small things into once again, something big. 37:12It's this, this, this is a normal and the pendulum swinging 37:15Tim Hwang: basically. 37:16Marina Danilevsky: We're just in that part of the swing. 37:19So it's not that it's not valuable. 37:20It's just that what are people innovating the most rapidly and that the pendulum 37:24will swing of where the focus is there. 37:26Armand Ruiz: I will add just things that people don't talk about it 37:29that much is, for example, when Lama 3 went out, I think Lama 3 had, 37:33um, what was the context window? 37:36Like 32, 000 or 16, 000, but it wasn't super large. 37:40Uh, and days after the release, the community was already contributing, 37:45uh, A version with a technique to increase the context window. 37:49So those are small details that only the practitioners notice, uh, and, and 37:54they don't get into, into the headlines, but that is really the power of open 37:58source that those contributions that innovation and, um, that's going to be 38:02really, really hard, really hard to stop. 38:06Tim Hwang: Yeah, that's a great note to end on. 38:08Well, that's all the time that we have for today. 38:10Marina, thanks for coming back on the show. 38:12Marina Danilevsky: Pleasure. 38:13Tim Hwang: And Kay Armand, it's been awesome having you on the show for 38:15the first time, and hope to have you again on the show sometime. 38:19Thanks so much. 38:20Thanks Kate Soule: for the great discussion. 38:21Tim Hwang: Thanks for joining Mixture of Experts. 38:23And for the first time, a quick call out to all you listeners. 38:25We're thinking about doing a segment in the next few weeks 38:28that will focus specifically on agents and what's happening there. 38:32Uh, we're always looking for interesting stories and people to talk to, so 38:35if you've seen any cool papers or companies or people working in the 38:38space, um, please, uh, drop a line in the comments on it and we'd love 38:41to pick it up in a future episode. 38:43Um, see you next time.