Learning Library

← Back to Library

Balancing AI Memory and Privacy

Key Points

  • The panel debated whether AI assistants should retain all personal data, concluding that users need granular control over what is remembered and an “incognito” mode for privacy.
  • Google Gemini’s new memory feature for premium users demonstrates how persistent personal context can personalize interactions, while Microsoft’s head of AI, Mustafa Suleyman, predicts near‑infinite model memory soon.
  • Experts argued that expanded memory is more than extra context; it makes AI responses more relevant, boosts user adoption, and unlocks novel creative capabilities in generative systems.
  • Vyoma Gajjar highlighted that personalized memory creates a seamless, repeatable experience, whereas Vagner Santana emphasized the necessity of selective retention and privacy safeguards.
  • Shobbut Varshney humorously noted that an AI that “remembers everything like my wife does” would need an incognito mode, underscoring the balance between convenience and confidentiality.

Sections

Full Transcript

# Balancing AI Memory and Privacy **Source:** [https://www.youtube.com/watch?v=Z-6d1gnSOAI](https://www.youtube.com/watch?v=Z-6d1gnSOAI) **Duration:** 00:43:00 ## Summary - The panel debated whether AI assistants should retain all personal data, concluding that users need granular control over what is remembered and an “incognito” mode for privacy. - Google Gemini’s new memory feature for premium users demonstrates how persistent personal context can personalize interactions, while Microsoft’s head of AI, Mustafa Suleyman, predicts near‑infinite model memory soon. - Experts argued that expanded memory is more than extra context; it makes AI responses more relevant, boosts user adoption, and unlocks novel creative capabilities in generative systems. - Vyoma Gajjar highlighted that personalized memory creates a seamless, repeatable experience, whereas Vagner Santana emphasized the necessity of selective retention and privacy safeguards. - Shobbut Varshney humorously noted that an AI that “remembers everything like my wife does” would need an incognito mode, underscoring the balance between convenience and confidentiality. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=0s) **Debating AI Memory Preferences** - Experts discuss whether AI assistants should retain all user data, advocating selective recall, purposeful memory, and privacy controls, while the host previews upcoming AI news. - [00:03:05](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=185s) **AI Memory, Privacy, and Business** - The speaker debates the privacy implications and commercial motives of AI agents that retain extensive personal data, contrasting it with existing internet tracking while expressing expectations for personal memory retrieval through such systems. - [00:06:11](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=371s) **Balancing AI Memory and Forgetting** - The speaker argues that AI should preserve valuable long‑term data while letting users delete or hide fleeting preferences—citing concerns about perpetual recommendation loops and praising incognito modes and temporary chats as ways to keep interactions short‑lived. - [00:09:15](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=555s) **Balancing Model Fine‑Tuning, Privacy, and Fairness** - A speaker examines the challenges of fine‑tuning AI models while addressing privacy concerns, ensuring fair treatment in applications such as credit scoring, and reconciling business value with the protection of historically marginalized communities. - [00:12:17](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=737s) **Microsoft Partnership Security Deep Dive** - The speaker celebrates a 30‑year Microsoft alliance and recent awards while detailing new native access‑control safeguards for Copilot and SharePoint, along with extensive governance and traceability integrations involving third‑party AI tools. - [00:15:26](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=926s) **Multi‑Stakeholder AI Safety & Metrics** - The speakers discuss how clients now demand built‑in security guardrails, customizable success metrics, and cross‑functional oversight (finance, legal, tech) to ensure AI products are safe, compliant, and trustworthy. - [00:18:33](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=1113s) **Layered AI Security in B2B** - The speaker explains a three‑level security framework—infra‑structure, data, and application—used by enterprises, illustrated with a Fortune‑50 CPG’s custom water‑bottle design campaign that mandates model red‑team controls to filter outputs. - [00:21:39](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=1299s) **Azure Foundry Highlights and AI Roadmap** - Shobhit outlines recent maturity gains, custom copilots, Azure Foundry’s unified AI toolset, industry‑specific models, security features, and upcoming AWS re:Invent plans as key announcements before leaving Ignite. - [00:24:54](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=1494s) **From Benchmarks to Expert Tests** - The speakers critique the overuse of AI benchmarks and suggest the next trend will be ultra‑hard, novel challenges that only world‑class human experts can tackle, exposing the limits of current models’ reasoning and their reliance on word‑prediction. - [00:28:02](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=1682s) **Calling for Domain‑Specific AI Benchmarks** - A speaker argues that enterprises need harder, domain‑specific benchmarks to evaluate AI models, likening the process to hiring and performance reviews of human employees. - [00:31:07](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=1867s) **Consistency Benchmarks for Mathematical AI** - The participants argue that current agentic frameworks lack repeatable accuracy, calling for evolving benchmarks—such as those measuring consistent mathematical reasoning and exposing a model’s step‑by‑step “cognitive residue”—to better evaluate and guide future AI research. - [00:34:14](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=2054s) **AI Reasoning, Model Hiring, AlphaFold3 Debate** - The speakers discuss the growing importance of AI reasoning abilities, liken model deployment to recruiting staff, and then shift to AlphaFold 3’s breakthrough in protein prediction and the controversy over its restrictive licensing. - [00:37:20](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=2240s) **Open AI: Access vs Risks** - A participant discusses how releasing AI models to public servers democratizes research by alleviating computational barriers, while acknowledging commercial interests and potential misuse concerns. - [00:40:26](https://www.youtube.com/watch?v=Z-6d1gnSOAI&t=2426s) **Multi‑Layer Governance for Generative AI** - The speaker predicts a comprehensive, multi‑tiered oversight framework—covering prompts, models, agents, and watermarked outputs such as AlphaFold—that will be administered by dedicated boards or platforms to enforce ethics, regulations, and performance metrics as AI systems become more sensitive and proprietary. ## Full Transcript
0:00Should your AI assistant remember everything about you? 0:02Vagner Santana is a Staff Research Scientist, Master Inventor, and 0:05Responsible Tech team member. 0:07Uh, Vagner, welcome to the show. 0:08What do you think? 0:09No, absolutely not. 0:11I think we should be able to, uh, tell to our agents what to 0:15remember and what not to remember. 0:17All right, a little bit of both ways then. 0:18Uh, Vyoma Gajjar is an AI Technical Solution Architect. 0:21Vyoma, welcome back to the show. 0:22Um, what do you think? 0:24Thank you. 0:24And only what matters, only purposeful things. 0:27And Shobhit Varshney, who I'm declaring is the MVP of MoE expert 0:31guests, Senior Partner Consulting on AI for US, Canada, and Latin America. 0:35Shobhit, what's your hot take? 0:37It should remember everything just like my wife does, but I do need an incognito 0:41mode that I don't get with my wife. 0:43Okay, got it. 0:44All right. Well, lots to talk about there. 0:46All that and more on today's Mixture of Experts. 0:54I'm Tim Hwang, and welcome to Mixture of Experts. 0:56Each week, MoE aims to be the single best place to get caught up on the 1:01week's news in artificial intelligence and sort out what it means to you. 1:05Today is jam packed. 1:06We're going to talk about Microsoft's recent announcements 1:08at its Ignite conference, a new math benchmark, AlphaFold3. 1:12But first, I want to talk a little bit about memory. 1:15So Google Gemini is touting a new memories feature available to premium 1:20subscribers, where the model basically can recall facts about yourself. 1:23So you can say, I like prefer, um, you know, apples over oranges, or 1:28you can say, I'm really interested in these types of topics, and the model 1:31is, uh, allegedly able to recall and use this for context of the future. 1:36Uh, at the same time, there was an interview done with Mustafa Suleyman, 1:39who, formerly of Inflection AI, is now the head of Microsoft AI, um, and he 1:44gave an interview basically claiming that they are, uh, sort of on the brink of 1:48unlocking, quote, sort of near-infinite memory for models coming soon. 1:53Vyoma, maybe we'll start with you. 1:54Why does memory matter? 1:56Is it just more context for these models? 1:58Or is this kind of like a game changer in some ways? 2:01I do feel it's going to be a game changer going forward. 2:03It's not just context, but It's also making it much more, uh, relevant to 2:09the users that are using it, which helps people adopt, uh, these technologies much 2:13better, because now if you have an AI model, which kind of knows exactly what 2:17you want and how you want things, it makes me go back over and over again to it. 2:22Um, the other thing that I feel is, If we keep looking and experimenting 2:28with AI models with more memory, we get into a point where we come up with much 2:34more creative nuances of generative AI, which we've not seen as much yet. 2:39Yeah. 2:39And I think the user experience of this is going to be so interesting. 2:42I mean, we even saw it in the first question is, you know, Vagner was 2:44like, well, it should remember what it should and shouldn't remember 2:47what it shouldn't, which is very much almost like, I want to choose, right? 2:52Uh, and then I think Shobhit, you were like, it's just remember 2:54everything, but I want the ability to opt out when I need to, right? 2:57And those are like two very different ways of looking at it. 2:59Um, I guess Vagner, do you want to start? 3:01I'm curious about like why you answered the way you did and what 3:04you're trying to preserve, right? 3:05When you're, when you give that kind of response. 3:07I was thinking about situations in which, uh, imagine that we have agents that know 3:14almost everything about you, near-infinite memory, and the consequences of that. 3:19Like imagine, um, advertisement or other things that could be done with that kind 3:25of information, um, willingly or not, or not, uh, uh, considering your privacy 3:31or your, uh, uh, desires, so to say. 3:34So if, if there's a, um, let's say, uh, a business model behind of that 3:40using this near-infinite memory to offer things to you that I think 3:45that something that supposedly was to improve the user experience thinking 3:49about things that is are good for you but then with a business model behind 3:53that that may become a different thing. 3:56It's a little scary. 3:56I mean, I don't know, Shobhit, if this is where you were going, but I 3:59mean, there's one counter argument, which is this is like the internet. 4:02You're like describing the internet. 4:03People are already tracking you all the time. 4:05You know, kind of why should models be any different? 4:07I don't know if, Shobhit, that was the direction you were thinking 4:09about going down, but I think that's like one question here. 4:11There's a different take if you're looking at my personal day to day world. 4:15Uh, like if I need to go remember what I did in, uh, in Mexico, like 4:19six months back, where I stayed and stuff, I just expect to go 4:22into Gmail and ask that question to Gemini and get a response, right? 4:25So I do expect that there's somebody who's augmenting my long-term memory. 4:28We are really good at short term memory. 4:30I need somebody to maintain that long-term. 4:32Uh, I have been very consistent in my responses on this, on this podcast 4:36about enterprise focus, right? 4:37So for me, when we start to look at enterprise, I'm working with a very 4:40large healthcare client right now where we're trying to build these, these 4:43virtual assistants that'll have infinite memory because they are essentially, 4:47they're, picking up where you left off. 4:49Every time a conversation starts, it should be a hot, warm start. 4:53It should not be a cold start where we're asking information 4:56about what we should already know. 4:58So picking up from that moving toward one on one personalization, that's 5:02the promise we've had for a decade. 5:04But ultimately, now we have the right way of doing it. 5:07So if I do load all that context from back-end system passed on to a large 5:11language model, in the memory itself, it should be able to go look at what 5:15we've had the conversations around and then be able to fine tune it and the 5:19conversation that they're having today. 5:21So it has to be more contextual. 5:23So for me, the memory plays a huge, uh, has a huge impact when 5:27you're looking at enterprise. 5:28One-on-one personal relationships you can build versus having a 5:32very generic goal you introduce yourself from scratch every time. 5:35Yeah, it'll be sort of interesting. 5:36I mean, I guess Shobhit, you're almost arguing, and it is one of the questions 5:38I had is like how competition around this feature is going to emerge over time. 5:42And you're almost kind of saying, well, look, maybe if it's like a personal 5:46chatbot assistant, it'll have to just be a lot more discriminating on these things. 5:49But in the enterprise setting, people, people want access to everything. 5:52You know, I don't know if that's what you're saying. 5:54I just wanted to add a little bit nuance there to Shobhit's point that 5:57there is an unexplored territory in the whole generative AI world, 6:01which I feel is called forgetfulness. 6:03So let's say, um, a lot of models, they keep remembering things about you. 6:08Imagine if it keeps building on some irrelevant data. 6:11As a human myself, I'd like to forget some data about myself, 6:15or I'd like to change myself. 6:17I might like Korean food today, but tomorrow I just might not like it. 6:21So how do you make these systems not to forget the long-lived data, but like 6:26use that more efficiently and make sure that we're using those relevant parts 6:31such that the AI systems don't become bogged down with irrelevant information 6:37and they're not biased going forward. 6:39Yeah, I can imagine actually, there's going to be this period where, you know, 6:41we're all very excited about memory, we're going to have infinite memory features, 6:44but then it's going to be that funny thing where you like browse one thing 6:47on Amazon, and it just recommends that forever, you know, it's like, Oh, you 6:50really want to buy, you know, my friend bought like a toilet seat recently. 6:53And it was like, customers like you also enjoyed, and it was just 6:56all more recommendations, not really realizing that, like, that's 6:59kind of an incidental transaction versus a, you know, an ongoing one. 7:03That's what I meant by saying incognito, right? 7:05Mm hmm. Yeah. 7:05With chat GPT today. 7:06I do temporary chats quite a bit. 7:08I don't want to remember memory about things that I'm asking 7:11it to do every time, right? 7:12So temporary chat with chat GPT, incognito with Chrome and Safari and 7:16things of that nature, those are meant for those kind of use cases, right? 7:19This is a one off thing that I want to do. 7:20I don't want you to remember this. 7:22I don't have that option with my wife. 7:24I just want to have infinite memory. 7:30That's right. 7:31Uh, I'm a question for you for some, uh, you know, I'm sure some listeners 7:34will have this question, which is, Is memory just kind of like RAG ultimately? 7:38Like it's ultimately just like a document of facts that 7:42the model is retrieving from. 7:43So, you know, what is the difference here if there is any difference? 7:46Is memory more of just like a marketing phrase or is it, or are 7:49there actually kind of technically different techniques going on here? 7:52Yeah, so it's a very nuanced world now that we are in this 7:55whole multi-modal world right now. 7:58So when we- when we speak about memory, it might not just be text that you're 8:03giving out or prompts that you're giving in about yourself or you want 8:06to ask it things about something else. 8:08Let's say there are clients that I work with myself as well. 8:12I give it a picture that, hey, this is a picture. 8:14Can you help me evaluate what is in it? 8:17Can you develop this picture for me or can you like edit it for me? 8:21So it's not just a part of information that you're 8:25providing, but different facets. 8:27to it as well, about something that you like or something that you dislike. 8:31So I don't feel that eventually it is a RAG because unlike humans, it will add 8:38all this information as a structured data into its particular memory, but 8:43there might be different models to it that it also evaluates it against. 8:47So, um, RAG can be one use case for that particular, uh, infinite memory, uh, 8:56ingestion, but there can be many other things that can be done on top of that. 9:00Got it. 9:01Yeah, that's a really useful clarification. 9:03Uh, Vagner, I think I'll give you the last word on this one. 9:06Um, you know, I don't know if you buy all this, right? 9:08Uh, it seems like out of the panelists, you might be the 9:10most kind of privacy sensitive. 9:12Um, but do you think the future is kind of like optimal forgetting? 9:15You know, I guess you could even kind of fine tune a model, which is like 9:18it should forget the way I forget. 9:20You know, it's almost kind of like a really interesting way of potentially 9:22thinking about it, but curious like whether or not you think that there is 9:25this kind of nice distinction between enterprise and personal and and, you 9:28know, how these features will kind of evolve if we are sensitive to stuff 9:31like I think what you're raising, which are the privacy concerns. 9:34Yeah, and I think that only not for privacy concerns, but also thinking 9:37about fairness and thinking about, uh, certain, uh, business case as well. 9:42Um, in terms of fairness, if you think that, uh, let's say credit 9:46score, uh, how long should we consider in terms of, uh, data? 9:52to think about a fair credit score, right? 9:55How, uh, my life in the last five years, my finance life is different 10:00from my life like 15 years ago, right? 10:02So, uh, that is an important distinction to, to make and how these things can 10:07be done to increase, uh, biases, right? 10:11And, and unfairness in certain algorithms, in certain systems, 10:14in certain, uh, businesses. 10:16So I think that my concern is more to that direction, how, how to define 10:20this, uh, uh, interesting point, right, on how to, um, keep a context 10:26that is useful for the, the user and, and actually brings value in terms of 10:30business context, but also protect the user and also people in communities 10:34that have historically been marginalized or penalized because of their, uh, 10:39uh, uh, characteristics that some, uh, unfair algorithms may, uh, distinct in 10:46when, when taking certain decisions. 10:47I think that that's the, uh, uh, uh, hard point Sweet spot to find, but it's 10:52something that we need to have in mind. 10:54Totally, yeah. 10:54I mean, you're talking about kind of a classic problem of, right, 10:57like I, uh, I commit a crime when I'm a young kid, right? 10:59And then that just follows me around for the rest of my life. 11:02And how do we want to manage that? 11:03It's like genuinely a pretty, a pretty hard problem. 11:06And I actually wonder if this will almost become like a, uh, a kind of competitive 11:09thing in the market as we go forwards. 11:11Right now, everybody's like, memory's a new thing. 11:13So, you know, Google says we have memory. 11:15Microsoft wants to come out and say we have infinite memory. 11:17Um, but I actually wonder if after, after these features become more commonplace, 11:20the, the reverse will be the case. 11:22It'd be like, I've, like, this product forgets in the right way. 11:25And like, that's why you should use it. 11:26Will be very, very interesting to, uh, to see. 11:34So our next topic for the day is Microsoft Ignite. 11:37This is the Microsoft conference for I .T. 11:39professionals and developers that happens just this week. 11:42Um, there is a large number of announcements, but in particular 11:45was interested that the company made a really big emphasis on safety. 11:49So among other things, the company announced there was an event they 11:52announced called Zero Day Quest, which will be an in person security event, 11:55and they announced a fairly large amount of money for $4 million in bounties 11:59to expose vulnerabilities in AI. 12:01Um, and I guess Shobhit, I wanted to turn to you first 12:04because I know you were there. 12:05I think you just got back, um, and you were covering it on the grounds. 12:08Uh, curious from your eyes what you saw there. 12:10What are the big trends? 12:10So we, Microsoft was a really big partner for IBM and I'm wearing 12:14that IBM Microsoft shirt right now. 12:16Yes, . Yeah. 12:17We got the logo. 12:19So we got, we got a couple big partner of the year awards for all 12:22the work that we do with Microsoft. 12:23We've had more than a 30 year partnership with Microsoft. 12:26It's been scaling tremendously, both on the consulting side, as you would expect, 12:30but also on the IBM technology side. 12:32Within Ignite, this is one of the things that they had to address 12:37very clearly around security. 12:39Uh, what happens with the copilots? 12:41Are they, is there any possibility of leaking data anywhere? 12:45How do you do access control and things of that nature? 12:47Uh, I believe out of all the companies that we work with, all the ecosystem 12:50partners, understanding the ecosystem of SharePoint access, the email that was 12:55sent to somebody but not to the others, that kind of a graph of access control is 13:00something that's very unique to Microsoft. 13:02So they're doubling down on that. 13:03So they are making sure that if I do. 13:06That's the create a agent to go to a particular SharePoint, the access 13:10control automatically kicks in, right? 13:12So they've just made it natively embedded across everything. 13:15They also spent a lot of time and I spent the next two days with them on 13:18deep diving, technical deep dives on specific capabilities around governance, 13:22all the partnerships that they've done with showing what's happening 13:25across the pipeline, weights and biases or with Kredo, with Arise, and 13:30all the other third party tools, that gives you the full gamut of what's 13:34happening, all the traceability, all the evaluations, things of that nature. 13:38So I think they've addressed both the transparency and evaluation frameworks, 13:42uh, governance, as well as the security controls in place very, very well. 13:46I was, I was genuinely pretty, pretty happy coming out of the conference. 13:49I've done some hands on work with them. 13:52They've addressed this phenomenally well. 13:54Yeah, for sure. 13:54And Vyoma, maybe I'll turn to you, because I think one way of reading all 13:57these announcements is very clear, right? 14:00Like, Microsoft wants to be the safest place, in some 14:03ways, to design AI products. 14:05And I think it's sort of a really interesting kind of competitive 14:08edge on the market, right? 14:09You could, you could come to the market and you say, we've 14:11got the biggest, baddest models. 14:12Uh, and I think Microsoft obviously wants that as well, right? 14:15But I think it's also now saying, well, one unique thing about working 14:18with us is, is safety and security. 14:20Um, and I guess you work with a lot of customers. 14:23How are you seeing customers kind of trade off against these things? 14:25Because I'm sure on one hand, they want, the shiniest tool, right? 14:28But on the other hand, as they're worried about, you know, kind of like 14:30the security of the deployments they want to work on and curious about how 14:34people are balancing that and whether or not you think this bid is really, 14:36it's where kind of the market is going. 14:37That's a great question. 14:39So the past one and a half year, everyone experimented with Gen AI. 14:42And they've done a lot of POCs, et cetera. 14:44But now rubber is hitting the road. 14:46Now things are going into production. 14:48Once they go into production, then come the different issues of privacy 14:51concerns, security concerns, et cetera. 14:54And I have always seen Microsoft as a leader, innovator, and now a 14:58steward of responsible AI as well. 15:01So the way that they are augmenting it into everyday, Uh, products 15:06such as Microsoft 365 and their different OS systems as well. 15:10That kind of showcases and instills that faith in users who use those 15:15products on a day to day basis. 15:16All of us I feel at some point, uh, use Word or like Excel, etc. 15:21And I feel that kind of is the right way to go to get everyone talking about it. 15:27And then when you say about like, uh, you asked me about 15:29how clients are looking into it. 15:31Each client wants security measures on top of it, if not any success metric 15:36that is available out of the box. 15:38They want to bring their custom metrics in. 15:41They also want to create their own metrics based on the kind of information 15:44that is coming out before, um, it- that particular product or like 15:49chatbot in our case or rack system goes into full fledged production. 15:54They want all security guidelines adhered to because now no longer there's 15:59just like the AI tech team sitting in a boardroom making the decisions. 16:04Now there's a finance team sitting there, a legal team sitting there, 16:07and then the entire tech team too. 16:09So there are so many different, uh, minds at play here that all of them will feel 16:15much more, uh, secure if there are like guardrails, uh, defined around them. 16:19Yeah, that's great. 16:20And Vagner, maybe I'll turn to you next because I think that 16:22there's this one question I've been pondering a lot in this space is 16:25when we say safety for AI products? 16:28That's, that's very broad, right? 16:30It's everything from, is your model going to leak the data that, you know, 16:33you've put into the system, um, to can hackers, you know, get access, right? 16:37Can they manipulate what the system does to even the bias questions? 16:41I think that you raised on the last segment, and, you know, so I think 16:44safety is kind of the shifting category where who's responsible for 16:47it and what you're working on is, is always kind of moving over time. 16:51And I think it does feel like here at least, right, there's a lot more of 16:54emphasis on what you might call kind of like this technical safety, kind 16:57of like cybersecurity in some sense. 16:59Um. 17:00Do you think that, like, you know, ultimately these teams will also 17:03be responsible for the types of bias questions that you raised 17:06earlier, or is that going to live kind of elsewhere in the enterprise? 17:09It's interesting that we see different approaches to this. 17:15There are some, um, some companies that put everything on developers shoulders. 17:21Like, oh, you're responsible for taking care of the safety. 17:26There's a lot of discussion right now on defining what. 17:29is exactly safety nowadays, because when we think, uh, we look back for, let's 17:36say, in aviation or other systems, it was, it had, it had a different flavor 17:41right now with stochasticity of these models, it's really hard to define and, 17:46uh, with, uh, synthetic data as well. 17:49So we are. 17:50touching a really unknown territory in terms of how to even how to 17:54define how what is safety in this world that we're living. 17:58Um, so I don't know how to answer your question. 18:03No one knows is the answer. 18:04I brought more questions because it's something that things, for instance, our 18:07group here is touching on and how to think about safety in these new terms, right? 18:13It's hard even to define it and, and, uh, to define the boundaries of that as well. 18:18Who's responsible for, for that? 18:19Well, we know that, uh, when we create a technology and when we 18:24deploy that, we have this, uh, um, entanglement with the technology, right? 18:29We are actually responsible for, for that technology that we're delivering. 18:33But when we look down for downstream implications, that becomes really more 18:39complicated, especially in a B2B settings. 18:41With enterprises, we look at it at three different levels. 18:43There's the security of the infrastructure itself, the network, the hardware, 18:47access, things of that nature. 18:49The next level from there is security of the data, who has access to what controls, 18:52things of that nature, any breaches. 18:54The third level on top is the security of the application itself, that includes the 18:57actual AI and the model and the LLM app. 19:00And things of that nature, right? 19:02So the three different levels, there's a varying degree of how much 19:05of that is the network security team involved, or those are the classic 19:08security teams in the companies. 19:10And as you go up to the application layer, you start to think more about 19:13the responsible use of how you want people to use this application. 19:17Let me just, uh, let's pick an example. 19:19One of the big Fortune 50 companies that we're working with in the CPG space, 19:24they just launched a massive campaign around um, their water bottles where 19:28you can go on the website and go create an image with Adobe Firefly and others, 19:32be able to print it on that bottle. 19:34And that's something that's very unique. 19:36It's a unique design that you have built and that gets shipped to you directly. 19:39So we are running that in that platform for them end to end. 19:42So in that platform, we need to make sure that model itself that we're using 19:46has its own red shirting and can go reject things like if you want to go 19:50create an image that's not appropriate. 19:52One level up from there is the actual cloud vendor, right? 19:56If you're building this on Azure, IBM, Google's of the world, and 19:59each one of those cloud vendors has on filtering processes, some 20:02policies you can set for in and out. 20:04One layer out is this platform that we have built for for them. 20:07That platform can have some rules as specific to that company. 20:11And across any third party tool or any third party cloud they 20:14use, we'll filter it out there. 20:15And ultimately, you have the application level. 20:17So for example, if you if you are if you have a lot of points with 20:21that particular company because of interactions, you may have unlocked some 20:25more premium images that you can create. 20:27But on the platform side, we may need to say creating an image 20:31of an astronaut is okay, but an astronaut on that bottle holding a 20:35competitor's drink, that's not okay. 20:38Or wearing clothes that are not appropriate is not okay. 20:41So all that has to be filtered in. 20:42And we're getting thousands of these every day when people are starting to use this. 20:45These things go viral very quickly. 20:47So we need to have enough of filtering mechanisms for the safe 20:51use of that particular product. 20:53But on the infrastructure side, it's more around cyber security. 20:56So the three layers as you go further up, it's a blend of security and 21:01the responsible use, I think, and organizationally, they'll start to 21:04get, get tied into one organization. 21:06Yeah, that's really interesting because I think, I mean, you know, Vagner's 21:09response kind of corresponds to what I've experienced this right now. 21:12It's a little bit of a free for all, right? 21:14It's like everybody knows they want these models to be safe, but there's 21:16maybe 10 different organizations inside a company, for instance, that are tasked 21:20with different aspects of the problem. 21:22And it ultimately has this effect of making the security posture 21:25very, often very incoherent. 21:27Right, and I think it's interesting to hear your prediction that, like, as 21:30things get a little bit more mature, these will start to kind of coalesce into 21:34maybe a single org, or maybe a single person will sort of be responsible, 21:37managing teams that are looking at this from a number of different angles. 21:40So Shobhit, maybe I'll ask you this, is before we leave Ignite, any 21:43announcements that you're excited about? 21:45Things to look forward to? 21:46So I think it's really reflects the maturity that we are that we are seeing. 21:50There were a bunch of gaps with the real world deployments and we might mention 21:53a couple, but as we have scaled these out with Microsoft and like we have all 21:57kinds of offerings around their copilots, we do a ton of custom copilots for 22:01clients at scale across each industry. 22:02We do a lot of Azure transformations, their OneLake 22:06fabric on the data side and stuff. 22:08Across each one of them, there was a lot of incremental progress they've made. 22:12Uh, the really cool thing, some of the really cool things that stood out for 22:14me personally was their Azure Foundry. 22:17They've done a good job of bringing all of their AI tools under one umbrella. 22:20So it's just on the studio, they've governance, they've models. 22:24There's a lot of talk about industry specific models, how to make this easy 22:27for you to fine tune with your own data. 22:29Azure Foundry. A lot of talk about security and stuff. 22:31And there are a few different features like side-by-side 22:34comparison of LLMs on the same topic. 22:37Google has been doing this for a while now. 22:39We need to go have a set of learnings from each other as well. 22:43So I'm trying to, we're seeing all the different vendors catch 22:46up to the kind of Things that are needed to put LLMs into production. 22:49In 10 days, we'll be at the AWS reInvent. 22:53Under NDA, we've seen quite a bit of really cool things that they're 22:56bringing out in the next year. 22:58It's really exciting to see all the different vendors catch up with 23:01each other and one up each other. 23:03And the great thing is they're all in the service of enterprises. 23:06So, data, their fabric, what they're doing with the data landscape on the Azure AI 23:10Foundry side, they've done quite a bit. 23:13They did have a lot of things around hardware. 23:16They are ensuring that the whole stack works, works very well. 23:20Uh, both from their own proprietary hardware, plus all the 23:22partnerships that they've built. 23:24Uh, they, we do a lot of work with companies like NVIDIA and Azure together. 23:28So there's a lot of clients where you're, on the infrastructure level, there's a 23:32lot of, uh, good, uh, synergy between a lot of our vendors working together. 23:36It's, it is, it was a great, great event and I'm just very, 23:39very excited coming out of it. 23:40Especially after the keynote, when you start to go hands on and you work with 23:44the product leads, the, the research teams and stuff, they've done a really 23:48good job at piecing everything together. 23:52A few episodes ago, we were like, we're just done with the 23:54summer announcement season. 23:56We've got a little bit of a break and it feels like basically like the 23:59gas is being revved up again as we kind of get into the end of the year 24:02with these final few conferences. 24:09Our third segment of today is going to focus on a new benchmark that's 24:13on the scene called FrontierMath. 24:15We love benchmarks here at Mixture of Experts, it's one of the things 24:18that we cover almost as ferociously as we cover new product features. 24:22And this one's particularly interesting because it was released 24:24by a research group, uh, Epoch AI. 24:27Um, and what's interesting about FrontierMath is that in contrast 24:30to a lot of benchmarks that you may be familiar with, this benchmark 24:34specifically contains unpublished, expert level mathematical problems 24:38that specialists spend days solving. 24:40So in contrast to like an MMLU or other benchmarks you might be 24:44familiar with where you yourself as a human could go through and evaluate 24:48them and do the test yourself, um, this is specifically designed to be 24:51the ultra hard benchmark on math. 24:54Um, and, uh, Vagner, maybe we'll start with you on this particular topic, you 24:58know, we've talked a lot about how the benchmarks are getting increasingly 25:01gained in the AI space, you know, when a new model comes out and they're 25:04like, look, we beat all the benchmarks. 25:06I think everybody kind of just like collectively rolls their eyes now and 25:09says, I'll just kind of, you know, load it up and test it out myself and see 25:12whether or not I think it's good or not. 25:14Um, But this one's really interesting, and I guess I'm curious if you think that 25:17this is indicating a kind of new sort of meta in, uh, AI benchmarking, where 25:22the new trend is now going to be like the benchmark that's so hard that you 25:25need to be a world class human expert to even deal with it, um, and, uh, and 25:29what you think that means for the space. 25:31I think that you already mentioned a really key aspect that they 25:34are novel and unpublished. 25:36So then I think that if you think about the challenge for mathematicians, 25:42it will be like, okay, you think that that thing reasons, let me show 25:46you what is a real hard problem. 25:50So I think that it was a really interesting approach to that. 25:53And I think that at the end, it shows us that. 25:56Uh, there's not much reasoning, right? 25:59It is word prediction and the, the, the technology that we are, uh, talking 26:05about and, uh, and, and trying to, uh, put things on the ground and discuss 26:11how the technology actually works. 26:12And I think that this, uh, benchmark is trying to expose more of the capabilities 26:17and also limitations of this technology. 26:18I think that it's an interesting aspect, and, and, and the interesting thing in 26:22the report is that, is that it was saying that, uh, only 2 percent of the problems 26:26from the benchmark were solved, right? 26:28And so, uh, uh, maybe, and, and I think the interesting thing is, 26:32Uh, what happened in these 2%? 26:34I think that is an interesting discussion as well. 26:35If these are novel and unpublished problems, how do this technology, 26:41uh, solve actually 2 percent of these problems that were unseen for that? 26:45I think that is the most interesting aspect for me, but, uh, yeah, I think 26:49that the, uh, it is an interesting, um, approach to show novel things and actually 26:56is not part of the training data, right? 26:57And how these technology is tackling that. 27:00And, and, and connecting to, uh, what we always talk here about, uh, 27:04enterprise business case, right? 27:06Uh, what will this technology do if it's touching for the first time, uh, to, uh, 27:12some enterprise, uh, uh, data that this technology, uh, had never seen before. 27:17So I think that that's an interesting discussion to, to, to bring up that, 27:21uh, we sometimes talk a lot about the training, the data used for training, 27:25and what would happen if, uh, let's say, uh, Gen AI or LLM specifically, 27:30uh, uh, interact with new data. 27:32And I think this is an interesting example of that. 27:34Yeah, for sure. 27:35It's really interesting. 27:36I mean, there's a fascinating point here about how, you know, because these 27:40benchmarks are unpublished, you start to see the real edges of AI capability. 27:45And so kind of like the apparent success of models against a bunch 27:49of these benchmarks may just be because we've been lazy, like no one 27:52wants to spend the money and time creating entirely novel test sets. 27:55And so you do end up having a lot of kind of like repeating from 27:58training data, you know, being the reason why there's a lot of success. 28:02Um, I don't know, I guess maybe Vyoma, Shobhit, I don't know if either 28:05one of you want to take this one. 28:06So this is very fun from a research standpoint, right? 28:09Like what's that 2 percent mean and what is this model doing 28:11and how does it succeed here? 28:13Um, I guess on the commercial side, like, do we, do we feel that enterprises 28:17are like, we need better benchmarks? 28:18Like we need harder benchmarks because it's clear that the benchmarks that we 28:22have aren't giving us enough signal into, you know, whether or not these models 28:25can do stuff that's more than just kind of search and retrieval effectively. 28:29So, um, This is my, my quick two cents on this. 28:32Um, when we are recruiting a human for a particular job in an enterprise, we have 28:38some good, uh, expectations on what we expect them to do out of the box, right? 28:43From day one, they've had a training in psychology or accounting and 28:47things of that nature, right? 28:48I think we need to have some level of entry-level domain expertise 28:53that we need to judge each model by. 28:55So in that sense, the corollary will be we need to have some 28:58benchmarks that are domain specific. 29:00And then within that, as the AI model starts to do better, just like we do 29:03performance reviews of our own team members on specific topics, a hard 29:07call came in about a tax issue, were you able to solve it or not, right? 29:11So we then start to differentiate and say this person is a subject matter expert. 29:14Not just in the domain, but in the expertise in this industry, in our 29:18specific company, in our specific, uh, uh, tools and the documents, 29:22things of that nature, right? 29:23So I think there has to be some level of gradation of what kind of 29:26benchmarks we go through, and then you give them scores accordingly. 29:30And that should be the way you charge for these benchmark, for these models, right? 29:34So if I'm hiring somebody who has a generic accounting degree, I may pay 29:38X dollars for them, but as they start becoming an expert and as they go 29:42through different tests, we, we know that they're, that they're doing a better job. 29:46There's also a continuous evaluation piece to it. 29:50So just, I'm not sure how many people realize this, but as a physician, my 29:54wife has to take an exam every X number of years to recertify herself that she 29:59knows pulmonary, she knows critical care well, and so on and so forth, right? 30:03We do need to have some sort of a Continuous benchmarking that's needed over 30:08time, the kind of problems that we need to solve or what we are seeing will change. 30:11So we do need to have a starter set of benchmarks for enterprise, 30:14and you keep evolving them based on the kind of question that 30:16we are really, really getting. 30:18There'll be very few people that we need in our organization 30:21that can humanly solve the math benchmarks that they just created. 30:26PhDs in this area were not able to deliver that kind of accuracy themselves. 30:30So is it working to stumping both humans? 30:32As well as AI. 30:34So you really need to be doing, writing a PhD in that particular domain to be 30:37able to answer that question in one shot. 30:39So this whole one shot getting to the answer versus using tools 30:43and agents giving to the answer. 30:45So there has to be a mix of all of that. 30:47And there's a sense of consistency of the response. 30:49If you look at Claude's desktop use, right? 30:52In the benchmarks they released, they had a benchmark at the 30:54very end about tool usage. 30:57That says that the consistency of the response, I ask you to 31:00go book me a flight to London. 31:03I asked you that in 10, in 10 times in a row. 31:07Today, the models getting, getting it once right out of 10 is very, very high. 31:12Getting it 10 times in a row correct is very, very low. 31:15It's embarrassingly low how bad these agentic frameworks are in getting the 31:19same thing correct in a row, right? 31:21So I think there's a consistency benchmark, there's some levels of 31:24benchmarks that are needed, and the benchmarks need to evolve based on 31:27the kind of usage we are seeing. 31:29Vyoma, you'd add something to that? 31:30Yeah, I was just going to add right exactly to what Shobhit said if we 31:33pivot into it, unlike the generic benchmarks that we have, the one 31:37that FrontierMath focuses on is the mathematical reasoning behind it. 31:41So imagine whatever Shobhit said that we need, um, uh, like the same 31:45answer not being consistent, imagine FrontierMath starts, uh, spitting out 31:49something like a cognitive residue. 31:52So every time it spitted out an answer, it gives you exactly how it did it. 31:56That's one of the ways and that has been pitched in this new 31:59research that has been coming up. 32:01And then the agents and the models start becoming more and more intelligent 32:06and understanding their own patterns. 32:08That, okay, whatever I said before is now based on this parameters, 32:13and then we go keep adding this. 32:14So I feel there's a whole avenue that is going to be released in the form 32:20of research in AI, mathematical AI. 32:23The moment I read FrontierMath, I'm like, oh, did they solve 32:25the NP hard problem yet? 32:27So that's, that's not, I don't see it in the future. 32:32But one of the things that I feel is it will at least help you, you 32:36know, understand some computations, which mathematicians or statisticians 32:41do to solve a particular problem and then at least rule that out. 32:46Imagine the amount of time and energy that is being put in creating 32:49validation test sets or test data sets. 32:52So all of that being done by some other technology to expedite the 32:57process would be a good avenue as well. 32:59Yeah, for sure. 33:00Um, yeah, and I think that's like one of the questions that I think is 33:03interesting that's kind of presented by FrontierMath is, you know, to Shobhit's 33:06point, these models are really bad at consistency, but you know what's really 33:10important with consistency is like math, like when you, when you add two 33:13numbers together, you're always supposed to get the same number every time. 33:16And so it almost kind of begs the question of like, is the 33:18technology good for math at all? 33:20Um, and I think, you know, these benchmarks are helping us kind of 33:23like think through the problem. 33:24I feel that. 33:24There's a, there's an adjacency that comes with being good at math. 33:29Your reasoning skills, the way you think through a problem, you break it down in 33:32your head, I think that helps LLMs do a better job at reasoning elsewhere as well. 33:38We see this with code as well. 33:40When you add code to the training data, You see them do a better 33:43job even on the tech side. 33:45So consistently if you look at models that have added more code, even for text-only 33:49LLM kind of responses and stuff, it does a better job at reasoning and understanding. 33:53So if I give you a problem where I need to figure out what is the real root cause of 33:57what the customer was complaining about, you do need to figure out that, oh, so 34:00far I've seen that there are three things that they talked about, the first two kind 34:03of got resolved, the third one did not. 34:05But this is what they really were complaining about, right? 34:07So the fact that you can reason and think through a particular model, and 34:11the word reasoning is not very well defined, uh, like in our industry yet. 34:14You'll see a lot of exuberance around the word reasoning versus some others 34:19will just dunk on it saying that, no, this is just smartly regurgitating. 34:23But across that spectrum, I think they get better. 34:25When they get better at math as well, 34:26Shobhit, I do really love the idea of thinking about this almost as kind 34:29of like recruiting and like a human sense or in the future, you're going 34:33to be like, I just need to get like three, I got to staff like three 34:36entry level models and maybe like one senior model to run the team. 34:39It will like start to feel like that as these evals really 34:42become kind of like the way we. 34:43We see whether or not we wanna work with these models at all. 34:50I'm gonna move us on to our final topic of the day, which is AlphaFold3. 34:54Um, AlphaFold3 is the technology that drove the Nobel Prize Award- 34:59uh, Award-winning work in using AI to predict protein and its 35:02interactions with other models. 35:04And it's potentially this major technology for using AI to advance, 35:07uh, scientific research, pharmaceutical development, and, and otherwise. 35:11And, uh, if you haven't been following the twists and turns on this 35:13story, it's been very interesting. 35:14So, so DeepMind, uh, originally released its paper and it said, look, 35:19if you're a researcher, you can get access to the model, but it's going 35:21to be on our servers and it's going to be under very specific kind of 35:25licensing constraints, essentially. 35:27Um, there was this big outcry in the research community saying, well, if 35:31you do that, we can't reproduce the research and, you know, it's kind of 35:34offensive from a research standpoint. 35:36And after a lot of pressure, DeepMind relented. 35:39Right? 35:39And so the big move of kind of the week is that DeepMind decided to take this 35:43major kind of groundbreaking technology and open source it to the world. 35:48And so I think maybe Vyoma, I'll start with you. 35:50You know, I think one way of looking at the story is this is super valuable stuff. 35:55Right. 35:56Like AlphaFold3 is like core technology that you could imagine building 35:59this enormous new business on. 36:01Um, and DeepMind, uh, I guess apparently was sort of bullied, you know, into 36:06releasing this model open source. 36:07And so, you know, I guess maybe I'll just kind of present the question 36:10was like, why would a company like DeepMind, um, want to give up this 36:14incredibly valuable trade secret? 36:16Like what, what is, what is pressuring them to do that? 36:18What does that tell us about the space? 36:20Yeah, it's, it's press-. 36:22It's not pressuring, I feel, but it does, um, kind of pivot people 36:26to use AI in their industry. 36:28That's one of the key things that I've been saying, like when everyone says that, 36:32oh, there are, uh, researchers who said that AlphaFold shouldn't, uh, do this. 36:36But there is another caveat to this as well, which they say that they wanted, 36:41researchers wanted to get a hand, uh, hands on with this, why is open source 36:44technology so prevalent in this world? 36:47Why does everyone, um, like it? 36:49Because it kind of opens new avenues, it helps create, uh, more 36:54IP because I'm pretty sure when you have a strong technology like this, 36:59there'll be so many different creative aspects which can be added to it. 37:02Imagine synthetic, uh, data generation for, um, pharmaceutical 37:06companies, et cetera. 37:08So that is one thing as well. 37:09But. 37:10I feel, uh, when AlphaFold did this on their own servers or depended it on 37:15their own servers, it helps reduce the computational, uh, resource need as well. 37:20There might be researchers or like universities or students 37:23who might not have that computational power to work on it. 37:27So I feel it is a blessing in disguise because now it's on their own servers. 37:31I know, um, it's a little bit of an advanced topic around IP, et 37:34cetera, but it does help everyone. 37:37It helps the future. 37:39Uh, to be, uh, at the future industry in this case because everyone will get 37:44a chance to build something and there are no stopping criteria such as not 37:48enough resources or computational needs. 37:51Yeah, I think that's a really good point. 37:53And I guess Vagner, I mean, you know, DeepMind had a number of reasons why 37:57they didn't kind of want to release this to the public, you know, one of 37:59them was, well, we want to balance the ability to open up new research, 38:02as you were saying, with our ability to kind of pursue this commercially. 38:06I know some people were also commenting like, this technology 38:08could also be potentially used for some, like, bad purposes. 38:11Like, once you start getting AI and bio, you know, you start to worry 38:14about, like, well, if a, you know, bad actor gets access to this stuff, 38:18what could they possibly do with it? 38:19Do you think those kinds of risks are overblown here? 38:22Or do you, do you kind of worry that this is kind of one trade off 38:24as we get more and more powerful models into the open source? 38:28Well, it's, it's interesting that, um, It is related to another prize, right? 38:34Winning a project. 38:35And, and I think that we're going to see more and more, uh, AI projects, 38:40AI technologists and their creators being, uh, uh, awarded this prize. 38:45And I think that the reproducibility and the transparency here, 38:48uh, play a key aspect, right? 38:50Because people want to know, okay, why this is so valuable, why this is, uh, 38:55uh, But to your point, I think that it's. 39:01It's also a challenge because, right, it's open source, and then what, what, 39:06what can be done with that technology? 39:08Um, this is, is really hard to define, and, uh, well, to one aspect, they 39:16only open, uh, for, uh, researchers. 39:20So I think that that And he's a better way to deal and and you're making it 39:24transparent for other researchers, but uh, it's not open to everyone I think 39:28this is an interesting uh approach because uh to your point, I think that 39:32yeah I agree that could bring high risks to you other uses of this technology. 39:37That's right. 39:38Yeah, i'm looking for I don't know kind of like it feels like 39:40everybody's struggling for a good like answer to this question, right? 39:43Which is we, we know we have all these benefits of open source. 39:46We want to preserve all these benefits of open source. 39:47And it feels like companies have been like, well, we'll give you 39:50a license that you need to sign. 39:52And maybe that's one way we'll deal with it. 39:53I know the other one has been, oh, well, we'll fine tune these models. 39:56So they'll be necessarily safe. 39:57But you know, the history of jailbreaks is that these models get jailbroken. 40:00And Vyoma maybe I'll give you the last word here. 40:02Uh, it's fascinating to think that like, we may already be moving from one era 40:06of open source and AI to another, right? 40:08Like maybe 2025 is this inflection point. 40:10point. 40:11Um, do you have predictions? 40:12Like where, where does this all go for 2025 in open source? 40:15First thing, like DeepMind doing this, but then we, I think we spoke in this, 40:20uh, about this in a previous podcast about watermarking, uh, technology 40:24called SynthID by Google as well. 40:26So imagine all the information that has been coming out from AlphaFold 40:29and that starts getting watermarked. 40:31So I feel Google's thinking leaps and bounds ahead. 40:34They have already created a structure which is going to help them. 40:38Stop these processes in which IP is being misused or sensitive 40:42information leaking out. 40:44So I feel that is happening. 40:45And one of the great caveats that I see happening in the future is how do you 40:50govern each aspect of a generative AI or a machine learning production flow? 40:57Not only a prompt, not only a model, but even your agent. 41:01Each information coming out of an agent being monitored by a particular governance 41:07model or a governance structure, I feel there's going to be a multifaceted 41:11governance structure or a platform coming up, which will not only have rules, 41:16regulations, ethics, responsible AI, but success metrics for each different 41:21task, which I feel someone or some place will come up like a company or like a 41:27board, which will have the final say in token on these processes needs to be 41:32governed with these different structures and these different, uh, policies. 41:37The actual evaluation of each step around the agent governance around 41:42how an agent is picking up tool and stuff, that is available today. 41:45We, we do when we are rolling out production agents, uh, flows for clients. 41:49We do have enough measures in place. 41:51I think the, the, the extension that you're, that you're recommending is 41:54these will become, will make their ways into the AI regulation as well. 41:58Today, AI regulations are looking at overall, uh, overall 42:01application and the use of AI. 42:02And yes, you're thinking that AI regulations will become much more 42:04precise on going further, further down into the, into the how. 42:09Mechanics and stuff, right? 42:10Exactly. 42:10So the question is that we follow and the one that we feel sure with is 42:14like, is your model going to be biased against a particular group, et cetera? 42:18Very, very, um, generic. 42:20They are not specific to the task or specific to the problem 42:24that we're trying to solve. 42:25Exactly what you said. 42:26That's something that I envision happening in the future. 42:29Well, as always, a lot to look forward to in 2025. 42:33I don't think we will be short of any stories. 42:36Um, and even the kind of close of the year is going to be a little bit 42:38crazy with all these conferences. 42:40Um, so I think that's all the time we have for today. 42:43Thanks for joining us, Vagner, Vyoma, Shobhit. 42:46As always, it's great having you all on the show and hope 42:48to have you back sometime. 42:50And for all you listeners, if you enjoyed what you heard, you can 42:52get us on Apple Podcasts, Spotify, and podcast platforms everywhere. 42:57And we'll see you next week for another episode of Mixture of Experts.