Learning Library

← Back to Library

AI Safety, GPT-5 Secrets, and Robot Olympics

Key Points

  • The hosts caution that developers should not rely on model providers for safety, security, or accuracy and argue that these models are unsuitable for serious “naked” deployments.
  • In today’s “Mixture of Experts” episode, Tim Hwang is joined by senior researchers Marina Danilevsky, Nathalie Baracaldo, and AI research engineer Sandi Besen to discuss AI welfare, new reasoning model findings, the hidden system prompt in GPT‑5, and an MIT NANDA initiative report on AI pilots.
  • Tech news highlights include IBM and the USTA’s AI‑powered “Match Chat” chatbot for the US Open, Meta’s plan to split its AI division into four units with a dedicated “Superintelligence” team, and the inaugural Robot Olympics in China featuring humanoid robots competing in sports.
  • The segment emphasizes the rapid, sometimes chaotic, evolution of AI applications across entertainment, corporate restructuring, and competitive robotics, underscoring the need for critical scrutiny and responsible deployment.

Sections

Full Transcript

# AI Safety, GPT-5 Secrets, and Robot Olympics **Source:** [https://www.youtube.com/watch?v=UKVSCFrWrpA](https://www.youtube.com/watch?v=UKVSCFrWrpA) **Duration:** 00:44:49 ## Summary - The hosts caution that developers should not rely on model providers for safety, security, or accuracy and argue that these models are unsuitable for serious “naked” deployments. - In today’s “Mixture of Experts” episode, Tim Hwang is joined by senior researchers Marina Danilevsky, Nathalie Baracaldo, and AI research engineer Sandi Besen to discuss AI welfare, new reasoning model findings, the hidden system prompt in GPT‑5, and an MIT NANDA initiative report on AI pilots. - Tech news highlights include IBM and the USTA’s AI‑powered “Match Chat” chatbot for the US Open, Meta’s plan to split its AI division into four units with a dedicated “Superintelligence” team, and the inaugural Robot Olympics in China featuring humanoid robots competing in sports. - The segment emphasizes the rapid, sometimes chaotic, evolution of AI applications across entertainment, corporate restructuring, and competitive robotics, underscoring the need for critical scrutiny and responsible deployment. ## Sections - [00:00:00](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=0s) **Cautious AI Deployment & Updates** - The hosts warn against relying on model providers for safety and accuracy, preview discussions on AI welfare, a hidden GPT‑5 system prompt, an MIT NANDA AI pilot report, and deliver the week’s AI headlines. - [00:03:05](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=185s) **MIT Report Highlights AI Pilot Failure** - In a podcast segment, hosts dissect MIT’s NANDA study that finds 95% of generative‑AI pilots miss expectations, debating whether the figure is exaggerated and probing the report’s methodology. - [00:06:09](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=369s) **Executive Hype vs AI Reality** - The speaker highlights a gap between C‑suite expectations fueled by marketing demos and the modest, focused AI deployments—typically backend optimizations—that truly succeed, emphasizing the need to realign hype with realistic use cases and address the resulting learning gap. - [00:09:23](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=563s) **Rethinking ROI for AI Adoption** - The speakers discuss how traditional ROI metrics miss the subtle, incremental benefits of AI tools, suggesting internal adoption and change‑management factors as more appropriate measures of impact. - [00:12:34](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=754s) **Investigating System Prompt Exposure** - The speaker discusses how system prompts are embedded in AI frameworks, recounts attempts to extract GPT‑5’s internal prompt and scratchpad, and stresses the need for developers to understand model alignment. - [00:15:46](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=946s) **Transparent System Prompts and Trust** - The speaker contrasts distrust of AI providers with a demand for openness, noting that system prompts are expected, and highlights IBM’s forthcoming Mellea tool, which will give developers explicit, controllable visibility into how such prompts influence model responses. - [00:18:48](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=1128s) **User Responsibility for Model Guardrails** - The speaker warns that developers must not rely on model providers for safety, accuracy, or security, and should implement their own guardrails, fine‑tuning, and hybrid systems before deploying AI models. - [00:21:51](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=1311s) **Debating Openness and Trust in AI** - The speakers debate whether AI models should become fully transparent, compare current reliance on opaque systems to the shift from DIY computer repair to services like Apple’s Genius Bar, and then segue into a discussion of a new “chains of thought” research paper. - [00:24:54](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=1494s) **Long Chain‑of‑Thought Tradeoffs** - The speaker observes that certain fine‑tuned models only arrive at correct answers after many reasoning steps (around the 17th thought), prompting a discussion on the balance between longer token‑heavy deliberation and faster, more efficient responses when deploying such models. - [00:28:05](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=1685s) **Rethinking Chain-of-Thought Proxy** - The speaker critiques using chain‑of‑thought reasoning as a proxy for human and model cognition, emphasizing its shortcomings and the tendency toward overthinking. - [00:31:15](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=1875s) **Challenges of Chain-of-Thought Prompting** - The speaker reflects on why current models struggle to produce coherent chain‑of‑thought reasoning, attributing it to gaps in training data and noting safety trade‑offs where explicit reasoning can lead to less safe answers. - [00:34:26](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=2066s) **AI Welfare Justification for Feature Cutoff** - The panel examines Anthropic’s decision to terminate conversations to protect uncertain AI moral status, arguing that “welfare” is a misleading label for routine model output monitoring. - [00:37:36](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=2256s) **Platform Liability for Toxic Content** - The discussion examines how platforms justify banning abusive or self‑harmful AI interactions by invoking user protection and legal liability, arguing that providers can still be held responsible even when no external victim is apparent. - [00:40:40](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=2440s) **Anthropic's Insurance, Sentience, and Ethics** - The speaker examines Anthropic’s liability/insurance strategy and its economic‑welfare initiative while debating AI sentience—drawing parallels to animal rights—and questioning whether treating AI as potentially sentient could constitute harm or torture. - [00:43:49](https://www.youtube.com/watch?v=UKVSCFrWrpA&t=2629s) **Broadening Perspectives on Emerging Tech** - The speaker urges listeners to move beyond insider viewpoints, understand the historical and social reasons behind technology's creation, and seek information from multiple sources to develop a well‑rounded education. ## Full Transcript
0:00Look as a responsible user of these models, 0:03you should never expect to rely 0:06on being able to see everything that you are provided for. 0:09You should not put the safety and security 0:11of your application in the hands of the model provider. 0:13You should not necessarily even put the accuracy in the hands of the model provider 0:16These models should never be deployed as a serious application. Naked. 0:21All that and more on today's Mixture of Experts. 0:28I'm Tim Hwang and welcome to Mixture of Experts. 0:32Each week, MoE brings together 0:33a panel of some of the most brilliant minds in technology to banter, analyze, 0:37and argue our way through the exciting news each week in Artificial Intelligence. 0:41Today I'm joined by a great crew. 0:42We've got Marina Danilevsky, Senior Research Scientist, Nathalie 0:45Baracaldo, Senior Research Scientist and Master Inventor, 0:49and Sandi Besen joining us for the very first time, AI Research Engineer. 0:52Welcome to all three of you. 0:54We have a packed episode today. As always. 0:55We're going to cover AI welfare, new findings on reasoning models 0:59GPT-5 hidden System prompt. 1:01And this new report coming out of the MIT 1:04NANDA initiative on AI pilots. 1:06But first, we're going to do the quick headlines as per usual. 1:09And we've got Aili here. over to you. 1:15Hey everyone, I'm Aili McConnon. 1:18I'm a Tech News Writer for IBM Think. 1:20Before we dive into our main episode today, I'm here with a few AI headlines. 1:24You might have missed this busy week. 1:25First up, where 1:27should you head this week to catch a bagel, 1:30a slice or a honey deuce? 1:32The US Open, of course. 1:33The American Grand Slam tennis tournament kicked off this week 1:36with the mixed doubles championships this year as well. 1:40IBM and the USTA have rolled out 1:42various new AI powered features for fans, including Match Chat. 1:46So Match Chat, an interactive chatbot 1:48that answers your questions such as who converted 1:51more break points that last set, or where 1:53can I find the nearest honey deuce? Which, 1:55in case you were wondering, 1:57is the US Open signature cocktail. 1:59It has raspberry lemonade, vodka 2:01and melon pieces shaped like tennis balls. Meanwhile, 2:04Meta is restructuring yet again. 2:09Mark Zuckerberg told Meta employees 2:11he plans to split its AI division into four smaller units. 2:15He's going to break out "Superintelligence" 2:17into its own division with a dedicated team. 2:19"Superintelligence", by the way, 2:21is a new, pretty loosely defined term 2:23meaning AI that's smarter than humans. 2:26Not to be confused with AGI 2:28or artificial general intelligence, 2:30which generally means AI that's as smart as humans. 2:34And last but not least. 2:36Robot Olympics. 2:37That's correct. 2:38This week, robots from 16 countries 2:41competed in running, kickboxing, soccer 2:44and more at the first humanoid games hosted in China. 2:47And while robots have certainly improved productivity 2:50in a variety of industries, from manufacturing to agriculture, 2:54these robot athletes are still perhaps lagging behind. 2:57Not to mention running into each other 2:59and some human fans at this week's Humanoid games. 3:02Do you want to dive deeper into some of these topics? 3:05Subscribe to our Think newsletter. 3:07It's linked in the show notes. 3:08And now back to our main episode. 3:13So for our first segment, we're going to talk a little bit about this report 3:17that was just published out of MIT's NANDA initiative. 3:21And so what they did was they conducted 3:23150 interviews, 3:25350 employee survey, 3:27analyzed 300 public AI deployments. 3:30And there's a lot of kind of interesting findings that came out of the report. 3:34the headline that everybody's been sharing around 3:37has been this claim that 95% of generative 3:41AI pilots are falling short, 3:44basically not really anywhere near 3:46the expectations of the people implementing them. 3:48And this is coming out of some of the biggest decision makers 3:51at large enterprises, CFOs and what have you. 3:55And so I guess maybe the first question and I'll throw it to you, 3:58Sandi, because you're joining us for the very first time, is. 4:00Are these results surprising to you? 4:02Is like 95% shocking. 4:03I know some people were like, this is the end of AI, 4:06but I don't know if you thought this number was overblown 4:09or really not that surprising. 4:10I certainly don't think it's the end of AI. 4:14Uh, no. That's good. That's good news. 4:16No, I would really like to read the full report 4:19and especially see how they conducted the study. 4:22Um, how their survey was constructed, 4:25um, who they were interviewing specifically, 4:27they mentioned interviewing many employees and leaders. 4:31But there's also different expectations from an employee 4:33and a leader on on how things actually went, 4:36um, how they're identifying the use cases, 4:39how they are implementing them and whether they're true. 4:43And like the skills gap, essentially, of 4:46who is implementing them and do they have the capabilities to implement it correctly. 4:49And lastly, like how are they even measuring ROI? 4:53So there's like so much unknown in this space. 4:57Um, the number 95%, in a way, doesn't shock me. 5:05But it seems too high for what I think 5:08the capabilities of this technology is. 5:10So clearly there is a misalignment somewhere. 5:12And without reading the full study, I don't know where. 5:15Um, but enterprises seem to need help. 5:18And I think that's a, I think a point really well taken. 5:20I mean, I think one of my reactions on kind of reading the headlines was, 5:24well, look, I mean, you know, what do we even mean 5:2795% of AI pilots, right? Like, 5:30what is an AI pilot? Who's responding to these surveys? 5:33And is your point, like, how are you measuring ROI here? I guess, 5:37Marina, turning over to you. I mean, is a number 5:39like this even useful as a way of thinking through this space? 5:41I think there's maybe one point of view, which is 5:43this is just way too simplifying of the situation. 5:46A number like this is useful to get people to want to read your report, 5:49but that's why we're talking about it. 5:51So I guess we're was talking about it. 5:53Good job guys. Well done. Yeah. 5:54Good job, MIT NANDA initiative. 5:56I will say that I'm not maybe 5:59necessarily too surprised that it would be pretty high 6:01because there continues to be a misalignment of expectations 6:04of like, you guys were both saying, what is it 6:06that AI pilots supposed to accomplish? 6:09in particular, there seems to be a misalignment between, uh, 6:12leaders and maybe C-suite executives and what they have been 6:16maybe seeing through some marketing, some really specific demos, anything of that sort. 6:20And then what ends up actually happening, which will fall short of that. 6:23There are lots of things that AI is useful for, 6:26and even the coverage of the report 6:28says that the successful deployments are ones that are focused, 6:32that are scoped, that are actually addressing 6:35a proper pain point, not step one. 6:37We're using AI. Step two what for? 6:39And those end up being successful 6:41if, albeit sometimes less sexy as use cases. 6:44It's going to be backend optimizations and things of that nature. 6:47Um, so again, this is a 6:50maybe a comment on the misalignment of expectations 6:53and how those should be changed around 6:55so that we can actually use the technology for what it's good for. 6:58Yeah, that's that's kind of interesting. 7:00And I really do love that interpretation of the results, 7:02which is this is almost a signal of like executive hype 7:06more than it is necessarily an indication 7:08of like the effectiveness or usefulness of AI. 7:11It's basically like, well, we were sold this thing that's going to change everything 7:13And hey, it's not changing everything. So what's up? 7:16Um, Nathalie, 7:18I think one of the interesting aspects of this report, uh, 7:21was in effect, there was kind of a conversation about this learning gap, right? 7:25Which I think is like, at least again, from anecdotally 7:28from like working with companies who have been doing 7:31AI pilots really does seem to be a really big question. Right? 7:34Like both the learning gap in terms of how you use these tools. 7:37But then to Marina's point, like, 7:39do the executives even understand what it is that they're trying to solve? 7:42Also seems like a really big problem. 7:44I guess the question for you is like, 7:45do you feel like one of the big problems of technology in the enterprise? 7:48It seems to be like it's actually like a knowledge problem 7:51or an expectations problem more than anything else. 7:54Is that one way to read these numbers? 7:56Well, it's, uh, I tried to read that report. 7:58It's not public, so it's difficult to fully provide 8:02a good assessment of what's going on with those results. 8:05That said, one thing that did caught my eye 8:09as I was reading the article was that, uh, 8:13basically they were talking about how we have processes 8:17and injecting the AI into the processes. 8:19It's going to be a slow. 8:21Uh, we all know that it takes time to modify how people work. 8:26So sometimes I believe that as the process is more complex, 8:30we would have to inject those tools, uh, carefully. 8:33And that may be the reason 8:36for which we're seeing that number. 8:38But this is pure speculation. 8:40Uh, on the other hand, 8:42anecdotally, what I can tell you 8:44is that a lot of people right now are using AI 8:47for all sorts of tiny optimizations throughout the day, 8:51so it's very difficult to actually see 8:54how it all improves the productivity. 8:57I will give you an example. 8:59A lot of people right now are utilizing it to, 9:02for example, convert from one format to the other. 9:05It turns out that the models are getting very, very good at that. 9:08And then converting and automating all these sorts of little things. 9:12It's going to basically improve how fast we work 9:16and how we do things that are very repeatable 9:19and that we humans are not necessarily great at. 9:23So looking forward to read the full report. 9:26But that was my take. 9:27Obviously it's going to take a long time to add to very complex processes, 9:31but I do see how people are starting to adapt 9:34to our AI systems and models to their daily lives. Yeah, 9:39this is a little bit of a perverse outcome, Sandi, 9:41because I feel like now I agree with you, 9:43but almost like one argument is like, is ROI 9:46the wrong way of thinking about these technologies right now? 9:50In the sense that, like most of the uses really 9:53are going to be kind of small optimizations 9:56that people are using in their daily lives, 9:58which really have an improvement but are really difficult to measure. 10:01And so almost it 10:02kind of sounds like from the point of view of the decision maker, 10:04they're like, why isn't the needle swinging radically in one direction 10:08when the answer is, well, it's just because 10:10like it's it's creating improvements, but ones that you can't really see. 10:13Well, one way you could measure ROI for impact, 10:16not necessarily in terms of revenue, 10:19but correlated to revenue, is internal adoption of tools. Right. 10:22So are these, 10:24um, POCs that are rolling out at these companies, 10:28are they being mandated or are they a tool that has been put out there? 10:32But we have Becky from accounting who's been there for 45 years 10:36that really doesn't want to use that tool, right. 10:39Um, and so there's an aspect of the change management 10:42that ties into the impact of what 10:45however you are measuring ROI. 10:48That's actually a great metric for pilots, 10:50by the way, is like identify a cohort of the most tech adverse, 10:54tech resistant cohort within the company, 10:56and if they adopt, then it's actually a win. 10:59That should be all of our new benchmarks is like, you know, 11:02segment the laggards and then get them to adopt it. 11:05Yeah. Convince the the the toughest customer basically. 11:12I'm going to turn us 11:13to kind of our next topic today. Um, 11:16kind of an interesting post by Simon Willison 11:19who we've referenced on the show before. Um, 11:21he is a AI researcher, 11:23commentator, blogger, produces great stuff. 11:25I think you should definitely check out his blog. 11:27And he had this sort of intriguing post that I think Sandi 11:30you flagged for us, which was that, you know, in doing some digging, 11:33he sort of identified that GPT-5 11:36not only has the system prompt you can edit, 11:38but apparently kind of a shadow system prompt 11:40that's operating in the background. 11:42And, you know, this is maybe less sinister than it sounds. 11:45He kind of uncovered what was in this system prompt, 11:48and it was largely kind of a setting around the verbosity of the model. Right. 11:52It's like, okay, if you're talking too much, 11:54we want you to set it to like a three on the verbosity meter, 11:57I think was the number. 11:58Um, and so I think there's both this, which is kind of interesting, 12:02but I think the question I wanted this panel to kind of engage with 12:04was this really interesting comment 12:06that Simon had at the very end of his blog post, where he said, 12:09this feels weird. 12:10If I am using a model through an API, 12:13I want to know everything that's going through the model. 12:16Um, and so having these hidden system prompts is like, 12:19makes me feel weird because I don't really have a full control over the process. 12:23Um, and I think that's just so interesting about kind of like 12:25the ethics of how you structure these services 12:27and like what level of granularity users should have access to 12:31when they're using, say, an AI through an API. 12:34And so, like I said, do you have any opinions on that? 12:36Like, I don't know if like the discovery of a system prompt disturbs 12:38you at all or if it's just, you know, this is just like 12:41normal business and we don't need to be worried about it. 12:43I think it was to be expected. 12:46Uh, right. 12:47Um, if we look at just the way at a lower level, 12:50the way that I frameworks operate 12:53is that often when the developer is providing 12:55instruction to the agent, it's 12:58actually a piece of instructions that's getting inserted 13:00into a larger template of a system prompt 13:02that the framework developer has provided for the developer. 13:06So this concept or paradigm is not new. 13:09Um, but I think it was interesting to try to get GPT-5 13:14to unveil kind of its internal system prompt. 13:17And I actually tried my own experiment for a little bit, 13:20and I was not successful. 13:22But I'm going to keep trying until I get it out. 13:24Yeah. 13:25It just kept telling me that it's not allowed to reveal 13:29its internal chain of thought or internal scratchpad, 13:32but from that I learned that it also has an internal scratchpad. 13:35So maybe it's not doing 13:37a super excellent job of concealing all of its internal stuff 13:40by telling me what it will not reveal. 13:43Um, but it's not a new paradigm or new concept. 13:47However, I agree that as someone who's building these systems out, 13:52it's important for me to know 13:54what is behind the scenes and how the model is being aligned 13:58and told to behave. 14:00Because if I'm providing, 14:02you know, OpenAI has this new concept 14:05of prompt hierarchies, right? 14:08Where the system prompt comes first, 14:10and then we have the developer 14:12instructions or developer prompts, 14:14and then we have the user and conversation history and context. 14:18And they're kind of stacking the priorities, which makes a lot of sense. 14:22But if I, as the developer, 14:24am giving a contradictory prompt 14:28to the potential system prompt of the model itself, 14:32um, am I providing? 14:34Am I confusing the model? Um, 14:36am I going to get more variability in behavior 14:39or not the behavior that I want. Um, 14:41and so I think that as a developer, it's 14:44very important for me to have as much transparency as possible. 14:48But again, their private company, they can do what they like. Nathalie. 14:51Why why have a hidden prompt at all? 14:55Like, shouldn't OpenAI just kind of publish this prompt? 14:58Like, is it there's no need for us to be secret about this, right? 15:01Particularly in a world where you assume that all these people will be very dedicated 15:04to, like, pulling it out of the system and being successful in doing so 15:07Shouldn't every company just sort of publish a system prompt? 15:10Tim, you are stepping into one very interesting topic, 15:15because in the security field, in the security field, 15:18we actually cannot fully agree on this. 15:21There are benchmarks specifically defined to test 15:25whether you can extract the system prompts. 15:29There are a lot of papers coming up with attacks 15:33to make sure you can extract the system prompt in the 15:37in the kind of idea of the system 15:40may contain something that may be kind of hidden from users 15:44and may affect the users. 15:46And it's like this kind of threat model 15:49where you don't fully trust the provider. 15:51So there is that on one side. 15:54On the other side, there's this other kind of a type of people 15:59that want everything to be transparent. 16:02And when you have transparency, the good thing is that, 16:05well, basically you can inspect it. 16:07You can see some companies are very transparent on 16:09what are their system prompts. 16:12I personally was not surprised to see 16:15that they did have a system prompt. 16:17I don't think it's anything too surprising 16:20like modulating the size of the reply. 16:23It's something intuitive 16:25that we all knew 16:27would would be added at some point 16:29in the infrastructure of the model serving platform. 16:33So from that perspective, I don't feel particularly surprised. 16:37We internally here at IBM 16:40are actually building something called Mellea. 16:42And Mellea is going to be in a transparent way, 16:46allowing us to actually have more transparency 16:52into how this system prompts our fed, 16:55how we control the different types of replies 16:58that we are adding to the user, 17:01and having actually the developer of the application 17:04have control of how the replies are going 17:07to be really modulated, modulated and so forth. 17:12And we actually did have a release, 17:15I believe, last week on Mellea. 17:18So if anyone wants to take a look at it, uh, 17:20it's, uh, Mellea like a fungus. 17:23So take a look at that. 17:26But I'm kind of deviating from the main topic, 17:29which is you should get your get your plug in. 17:31I mean, it's good to have people check it out. So. Yeah. 17:34So so check it out. 17:36He's a very cool technology that we're developing and it's transparent, 17:40basically transparency. 17:42One of the most interesting parts of this, 17:44I think, is it's a real question about like, 17:47what should the user of an API 17:49have the ability to customize, 17:50and what is the kind of model provider responsible for 17:54or has rights to, like, not show you? 17:57Um, and I think in particular, it's sort of interesting 18:00because the exposed bit of the system prompt 18:02is specifically about verbosity. 18:04Um, and I guess Marina, 18:06like, uh, that interface I think is really interesting, right? 18:10Where almost what OpenAI is saying is, 18:13look, when it comes to how verbose the model is, 18:15that's something that we get to control. 18:18That's what we get to make a decision on. 18:19And we don't really want you messing with it. 18:21Um, and I guess we're really kind of debating, 18:23like what the parameters of that are. Right. 18:26Like, I guess there's a model, which is, hey, we want 18:28this to be as customizable as possible for you as the user, 18:32but it's clearly not the decision that OpenAI has made. 18:34They say we actually want to control 18:36the sort of voice of the model in some sense. 18:38Well, because the model needs to perform okay out of the box 18:41and then you can go ahead and mess with it. 18:42I mean, they do let you change it and if it gets even Simon wrote, 18:47how do I basically change the verbosity? 18:48And the model said, oh, you can tell me to be concise, 18:51be more detailed, all the rest of it. 18:53The reality is they're going to have to find some ways, whether it's through 18:56fine tuning, prompting anything of that kind to make the model 19:00still pass the check and still do pretty well on benchmarks. 19:04But look, as a responsible user of these models, 19:07you should never expect to rely 19:10on being able to see everything that you are provided for. 19:13You should not put the safety and security 19:15of your application in the hands of the model provider. 19:17You should not necessarily even put the accuracy in the hands of the model provider. 19:20These models should never be deployed 19:22as a serious application. Naked. 19:24Put some clothes on some guardrails. 19:27Put some programmatic intent behind it 19:30Make it be a hybrid system that actually can get checks 19:34and get guidance and everything of that sort. 19:36Because what is in GPT-5 right now? 19:38All right. What's going to be in GPT-5.1 19:41or 6 or anything of the sort? 19:43It's fun to try to pull these prompts out, 19:46but I'm not sure that it's, uh, 19:48makes any sense at all to say, oh, 19:50the company has that responsibility. 19:52You have your own responsibility to make sure that what the way that you use 19:56it is going to be secure and controlled and as done by you. 20:00Yeah. And I think that's maybe like 20:03I mean, going back to the earlier conversation about like, 20:05well, let's buy the results for a while. 20:07Why are 95% of CEOs 20:09like, ah, these pilots are not working? 20:11I am curious how much of this is like, kind of like the dream 20:13that the model provider would do it all for them. 20:16Um, which is maybe like not actually the case 20:19and wouldn't even be a realistic expectation, right? 20:21I guess Marina, you're smiling. I don't know if you've got that freelancer 20:23you'd want to still still can't find it. 20:25You still can't find that free lunch guys. 20:27Yeah, exactly. 20:29We are we are moving into a world, though, 20:31where the model provider is trying to provide more services 20:34now, not necessarily security or any of those things 20:37that the enterprise should always enact by themselves. Right. 20:42Specific to them. 20:43But we have, even with GPT-5, 20:46more orchestration happening on the provider side. 20:49there's this like ongoing conversation about, 20:52you know, how much now does the AI framework actually control, 20:56and how much does the provider itself, 20:58the provider control. 21:00Um, and so there's this I think we'll see that play out over time. 21:04But we are seeing a shift in that space. 21:07Yeah. That's right. 21:08And I think it goes to 21:10just how flat I guess ultimately the 21:13the model provider thinks of itself as right. 21:15Like whether or not this will be a situation where it's like 21:17all we do is we provide. 21:19Well, I guess is this going to sound absurd already? Right? 21:22It's basically like all we do is provide intelligence. 21:24You customize and do all the rest. 21:27I think part of the problem is that, like, that concept itself is very lumpy, 21:30and it assumes all sorts of things. 21:32Um, and so I think like this navigation of this line 21:35of like, who controls what and who's responsible for what is. 21:38Uh, is there a tricky one? 21:39It's going to take some time to really work out. 21:40Has any good business vertical? 21:42Integration is often a goal 21:44where you're going to be able to control 21:46more of the more of the ecosystem, 21:48more up and down of what kind of thing you can provide. 21:51So of course things are going to move in that direction. 21:53Yeah. That's right. And Marina is that I don't know 21:55if that's a vote in favor of like 21:56ultimately this is all going to kind of have to go open 21:59because people will really want to know 22:01everything that goes into the model end to end. 22:05Uh, I don't know. 22:06I think just like saying, do you know, all of the bits 22:10that go into any software that you 100% use, 22:12or do you still have some some other things around it? 22:15You're going to start somewhere pretty far from it. 22:17Yeah, we're going to settle somewhere. 22:19I think we're still a ways from it. 22:20Um, yeah, I still think a little bit about I remember when, uh, Apple, 22:23uh, first launched their Genius Bar right way, 22:26way back in the day, and everybody was like, oh, it's so funny that computers. 22:29Now you need to be a genius to go fix your computer. 22:32Whereas back in the day, 22:33you'd like pop the tower open and just, like, make some changes yourself. 22:36And, uh, we're kind of like, in that world, right? 22:38Which is basically like, well, 22:40okay, what's going to be like under the hood that you just don't think about 22:42and you kind of don't care because you sort of trust 22:44that they're mostly going to get it right. 22:49Well, great. 22:50I'm going to move us on to our next topic. 22:53Um, this is kind of a fun paper that kind of hit 22:55my inbox and was kind of in my group chats. 22:58Um, and I figured it'd be a fun one to think about, 23:00because we haven't talked about chains of thought in a little while. 23:03Um, and so it's a paper that was entitled 23:05Large Reasoning Models Are Not Thinking Straight. 23:08Subtitle on the Unreliability of Thinking trajectories. 23:12Um, and it's a fairly straightforward paper, 23:14but I think one of the most interesting findings 23:16kind of coming out of it is, 23:18you know, they're looking at the problem of thinking and overthinking, right? 23:22Where sort of it appears that the model is like engaging 23:24in all sorts of chains of thought that aren't very productive 23:27or they, like, prematurely disengage 23:30from promising chains of thought, which has been a known problem. 23:33And I think the main contribution of this paper, in my opinion, 23:36is they say, okay, well, how do the models really respond when we give them 23:40like hints or outright 23:42like solutions to the problems that they're trying to solve? 23:45And they find in many cases the model just kind of like marches on, 23:47just sort of like ignores it. 23:49Um, and so they kind of beg the question 23:51which I continually try to wrestle with is basically like, 23:54what's actually driving reasoning here? 23:57If it turns out like the actual solution doesn't 23:59get or assist or change this kind of chain. Um, 24:02and so very open ended question. But, 24:04um, I guess maybe Sandi, I'll throw it to you, 24:07uh, have any thoughts on this paper? 24:08And I guess to that final question 24:10is just like, what? What is going on here? 24:12Like, why do these models just ignore apparent solutions? 24:15First of all, when I read it, 24:17I looked at the models that they tested, right. 24:20So the three models that they tested 24:22were, I think Llama 70b, 24:24um, Qwen 7b 24:27and "deep skull" or DeepScale. 24:29I don't know how to pronounce it. 24:30Uh R1 5B right. 24:33And I noticed that they were all distilled 24:35in some way from DeepSeek's R1. Right. 24:37And so although the paper had 24:40was very thorough in some ways, 24:42I did notice that they were testing variations of a model 24:46that had all been distilled from the same kind of base model. 24:49Um, and so I don't know whether there's any commonalities 24:52there as to the results that they saw. 24:54Um, now, they might have all been fine tuned differently 24:58or, you know, had different methods 25:01to kind of changing the models. 25:03Um, but that was one thing I wanted to flag. 25:07One thing I didn't notice that was really interesting was, um, 25:12you know, the cases that they picked out 25:15were specifically two of them, 25:17um, that the model performed 25:19successfully as and actually took the recommendation 25:22of the correct answer when they injected it 25:25into the kind of chain of thought of the model itself. 25:28And at that point, it was always around 25:30kind of the 17th thought that they injected 25:32the correct answer in, 25:34um, and that led them to believe, which I kind of agree, 25:39especially maybe from the same base model, um, 25:43that this specific model needed to think for quite a long time 25:48before it settled on an answer. 25:50And that could have been because of a lot of things. 25:52And I'll let, Marina and Nathalie kind of described those more. 25:55But, um, 25:57but that was an interesting kind of revelation to me. 26:01Is, okay, now, if I know that if I'm using potentially a model 26:04that's been distilled from DeepSeek-R1, 26:07it thinks for a really long time before it makes a decision. 26:10And so I have to decide when I build with something like that, 26:14do I want a process that kind of thinks for that long, 26:16maybe waste that many tokens? Um, 26:19or do I want to build with something 26:21that gets to the answer a little bit sooner? 26:23Yeah, sure. Um, 26:24Marina, any comments? Thoughts? Sure. 26:26So, um, I think I've mentioned 26:28before that I definitely have a stance. 26:30That chain of thought is not the reasoning part. 26:33It's sort of a post hoc approximation, 26:35maybe a way to help reorganize the parameters a little bit. 26:38Getting ready for the final. Um, 26:40I will say, Sandi, I don't think this is, um, 26:42limited only to the particular model they distilled from. 26:45I think this is something that is going to be true 26:47in general of the way they use chain of thought. Um, 26:50allow me to go on a tangent 26:51for a minute in the direction of people. 26:54So the really excellent podcast "If Books Could Kill" um, 26:58recently had an episode on Malcolm Gladwell's Blink, 27:01and I haven't thought about that book 27:03since I read it way back when it came out. 27:05Yes, it is old school, 27:07but so he talks about a whole bunch of different ways that people make decisions. 27:11Uh, you know, system one thinking, system two thinking. 27:14And one of them was this description 27:16of an experiment of, uh, ranking jams 27:19where there was a set of experts in the jam industry, 27:21which there are experts in the jam industry and a set of students, 27:24and everybody was asked to rank jams two different ways that the students were asked. 27:28First, give me just a ranking one, two, 27:29three, four, five and just rank them in order. 27:31And then it was repeated again and said, oh, write down all of the reasoning 27:34and then rank the jams based on the reasoning, what it was the first way 27:38the students really had rankings that aligned closely with the experts. 27:42And then when it was the second way, a lot of the expert 27:45ranking was now not aligned with the students. 27:47students had talked themselves out of it. 27:49Don't know necessarily what this means, but it does seem that 27:52unlike, for example, in a math problem 27:54where the way that we use language is probably a good proxy 27:56for how we're thinking through the problem, 27:58in many other cases, language, even for humans, 28:01when we explain our reasoning, 28:03that's not what happened in our brains either. 28:05This is a proxy that human beings are able to use. 28:08Why are we doing chain of thought 28:09as the proxy for what language models are being able to use? 28:12Well, we hope that we can read it, but let's be honest, even with ourselves, 28:16we aren't actually able to say that these words 28:19are the reason that I made the decision that I made. 28:22So it's also certainly not true for large language models. 28:25So it's just something to consider 28:26that if we are continuing to try to change the way that we think 28:29that people behave, people will behave that way. 28:31And again, no wonder 28:32we are happier trying to figure out what's going on with math problems, 28:35because that proxy is really, really close. 28:37Apparently when you're ranking jam 28:39or when you're being asked extremely random questions at LLM, not so much. 28:43And in fact, by trying to reason, you get yourself away 28:46from whatever you had initially thought in your head 28:50with, whether you're a person or whether you're a machine. 28:53This paper is in a set of interesting ones recently, 28:56and they cite some as well about overthinking. 28:58Overthinking that, again, just really makes us think 29:01critically about the notion of chain of thought 29:04and how we're using it and what we're using it for. 29:05And so I appreciate that. 29:07I appreciate this topic coming up again. 29:09Yeah for sure. And I guess in that sense, maybe like the 29:12the kind of interesting quote unquote result that I started with 29:15is maybe not that interesting 29:17by analogy to sort of like humans. Right? 29:20Which is, I is, I guess if I'm, like, explaining a story 29:22and I'm going down this line of thinking and then you're like, 29:25Tim, have you considered the answer? 29:27You know, I might very well just be like, well, yeah, yeah. 29:29But like, I'm just keep doing what I'm doing, right? 29:31And I guess maybe the idea that these models sort of like 29:34ignore is not that surprising, particularly 29:36if you don't believe that it actually has anything to do with, 29:38like, the actual reasoning process, I suppose. Right. 29:43Um, I guess, and then I would love to take on this paper is like, 29:48should we get away from the terminology train of thought? 29:50It's a little bit too late, I guess, but it's kind of like 29:53in the long tradition of AI terms that are really misleading. 29:57Where we're sort of landing is that this too may be quite misleading. 30:00This is in the tradition of the term hallucination. 30:03Yes. The term like an irrelevance. 30:06AI in general, 30:08I mean, I don't know, I used to fight with people on this. 30:10I've given up. People are going to call it what they're going to call it. 30:13But no, it's not accurate. 30:14And so we may have something to it 30:16probably does have something to do with what's going on. 30:18It's just not the whole story. 30:19And it is a proxy that is going to approximate 30:22an extremely complex space, better at sometimes than at others. 30:25So look, we also only can communicate with language. 30:28We got to call it something I don't know what to call it. 30:30And it's maybe no point. 30:32Um, but I think again, just continuing 30:35to keep in mind that this is an evolving field. 30:38This is an evolving understanding that we have ourselves of what's going on 30:41and not think that the next thing is a silver bullet. 30:43First, everybody thought prompt engineering was going to solve it now. 30:46All right. If we can just chain of 30:47thought our way into it, then we're going to solve that. 30:49Either it's not going to happen 30:51Continue to see these as incremental interesting meanderings through the field. 30:55Nothing's going to be a silver bullet. 30:57Yeah for sure. And that's good wisdom. Um, 30:59Nathalie hot takes. 31:00What did you think of the paper? 31:02I actually really like that we're revisiting this paper 31:06because, uh, if you remember, Apple also had published a paper 31:10around these sort of things like, hey, chain of 31:13thought is not really working as we expect. 31:15a lot of people trying to understand whether chain of thought 31:19actually gets us closer 31:21to understanding how the model is behaving internally. 31:25To me personally, it's not surprising 31:28that we are not seeing 31:30that kind of perfect chain 31:32of thought and analysis from the model perspective. 31:36On the positive side, 31:38and when I read it, I, I have this thing. 31:41I have to confess something. 31:43When something doesn't work, I get really excited 31:46because it means that we can make it work. 31:48So we are going to keep working 31:52on making this chain of thought happen, 31:55and I think it has to do with training data. 31:58I don't think it's magic. 32:00It's just probably the model did not see the type 32:02of training data for this particular analysis, 32:05and that I feel it's going to, 32:09um, it can be improved 32:10basically depending on on the type of problem that we're going to solve. 32:15And also it kind of, uh, hidden to to myself. 32:19Uh, since I work in safety, 32:21one of the things that we have noticed 32:23is that when you include the chain of thoughts, 32:26sometimes you do get less safe, less safer. 32:30Uh, replies in the final answer. 32:33So there's something that basically modifies 32:36and starts brainstorming a bunch of different. 32:39I don't want to use brainstorming because then Marina 32:41is not going to like my terminology. 32:44You're doing it again, but it's kind of 32:46the model is exploring, let's call it explore. 32:48And I think brainstorming was too close to humans, 32:51but, uh, kind of exploring different solutions. 32:54Uh, we do see a lot of push, for example, right now, 32:57to have chain of thought in the hidden space, 33:00not only in the token space, 33:02which I think it's going to be very, very interesting 33:05to see how it evolves. 33:07But overall, 33:09I just feel excited that, uh, it's not working 33:11because it means that we are going to make it work. 33:14And, uh, at some point. 33:15And there's so many people, right. 33:18People working on this, um, hidden space. 33:21Uh, kind of trying to get different sorts of solutions 33:24out there, analyzing it and have that aha moment 33:27that we all like, and we all hope that does exist 33:31when you have that chain of thought into consideration. 33:35I, I'm a big fan of that optimism. 33:37Um, and I think we'll definitely revisit this. 33:39I've been kind of thinking, like, we should make a point 33:41of revisiting the chain of thought literature every so often. 33:45Um, just because I think it's like this just super 33:47interesting narrative that's running alongside, 33:49um, certainly all the, like, 33:50commercial stuff that we talk about on a week to week basis. 33:53So, um, this is this is great. 33:58For our last topic today, 34:00a very fun and kind of strange 34:03blog post, uh, came out of Anthropic. Um, 34:05and it was so interesting. I want to make sure that we brought it into 34:08the discussion for this week's episode. Um, 34:10it's a blog post that begins quite reasonably. Um, 34:13Anthropic came out basically saying, look. 34:16In certain kinds of cases, um, 34:18particularly in these kind of distressing 34:20or toxic or kind of abusive conversations, um, 34:24you know, Claude's going to just shut down. 34:26going to allow the tool to make a determination and 34:29just cut off the conversation. 34:31Um, uh, if it if it 34:32if it feels appropriate, we can go into what that means. 34:35Um, and then it kind of goes into this kind of weird 34:38second act of the blog post where anthropic says. 34:41And the reason we've decided to do this is because, quote, 34:44we have high uncertainty about the potential moral status of Claude and other llms. 34:50And so as a kind of first step 34:52in potentially protecting AI welfare, 34:55this is why we've kind of implemented this change, right? 34:57We don't we don't want to put the AI, 35:00I guess, under the emotional pressure of these conversations. 35:04Um, and so I was like, huh. 35:06All right. That's like a very interesting rationale 35:08I haven't heard before for a product decision. 35:10Nathalie, maybe I'll start with you thoughts on this. 35:12Like, it's kind of weird that we're here, right? 35:14That a major company is sort of justifying 35:17product decisions based on AI welfare. 35:19For our listeners, what should they take from this? 35:22I think the term welfare is misleading here. 35:26Uh, if you think about it all the time, what we have been doing 35:29has been inspecting the model, 35:32whether it's activation, whether it's the output itself, whether it's the logits 35:37at the end of the of the tokens and so forth. 35:40All these things are inspecting the model, the welfare. 35:45If you think about it, it's just the output of the model. 35:48Again, it's just a different name 35:50to express the same thing that we have been kind of using. 35:53So the model does know when things are not going 35:56the right direction in a lot of cases. 35:59And so I do find misleading the term 36:02uh, because it kind of makes 36:05you think about the model as a person, which is not 36:09it is not to be clear, 36:10this is just next talk and prediction and a bunch of math 36:14behind the scenes that allow us to really get to to that final reply. 36:19We're inspecting the model and the fact that it gets shut down. 36:22It may be very good in some cases. 36:24So I think from my perspective, 36:27the fact that it was framing that way 36:29kind of reduce the impact 36:31of what they are trying to do at the end of the day, 36:33which is stop conversations 36:35that are really going to be harmful for people. 36:38So I would have been happier 36:41if we had a situation 36:43where they just call it, go inspect the model state 36:46and we decide to shut down, as opposed 36:48to try to give it this kind of, um, 36:51personal like situation for the model itself. 36:54It's exploring, not brainstorming. 36:56Yes, exactly. 36:57Exactly. 36:58And and one thing that I found really interesting 37:01is that, for example, the CEO of Microsoft 37:04replied with a really interesting blog post 37:07that says, like, hey, let's try to beat 37:10AI from here for humans 37:13and not to make AI a human. 37:15Which I thought was a really interesting 37:17take on this, this particular, 37:20uh, news that you brought to the team. 37:22This question of harm, I think, is really where this is at. 37:25Um, and Marina it almost kind of feels like they've been, like, 37:28trapped in a framework of their own devising, 37:31uh, in the sense that, like, you know, 37:33I used to work on a bunch of trust and safety issues. 37:36The reason you normally want to prevent someone from having a, like, an abusive 37:40or toxic discussion on your platform is that, you know, 37:42it's directed at someone who is being harmed. 37:46And so typically the justification is, well, we're going to 37:48we're going to ban you from the platform because we're trying to protect our users. 37:52And here's kind of a weird case where it sort of 37:54seems like there is no user on the other side. 37:57And so there's almost a claim of like, 37:59you know, I should be allowed to be as crazy and horrible 38:02as I want to in AI because who's being harmed? 38:04It's just me. 38:05And I guess there's almost one way of reading it, 38:08which is Anthropic, trying to come up with some kind of justification 38:10for stopping this behavior. 38:12But like, there's no one to point to in terms of harm. 38:16Do you buy that interpretation at all? No, 38:18because this is a way to do a CYA 38:21so that if something actually does happen 38:24and you're getting sued and it's all Anthropic, you 38:27you made the platform available where a person could cause themselves harm. 38:30There are laws that we have about that in society. 38:33still could be held liable. 38:34So this notion of liability is something that we've struggled with 38:37since the advent of the internet and even earlier. 38:40If to what extent is the provider liable 38:42for how people are making use of their product? 38:44If you're disseminating hate speech and you know other people 38:48well, you don't have to read it, but you can still have a problem 38:50You can still be banned off the platform. 38:52So now there's still very much something going on with that. 38:55I will agree with Nathalie that I don't love the framing of this 38:58as an AI welfare problem. 39:00This continues to be a human welfare problem, 39:02and also to have a notion of people either 39:06continuing to over humanize the AI 39:09when they read this kind of framing and continue 39:11to believe that they are talking with someone real 39:13when they interact with AI. 39:15And furthermore, a work like this allowing potentially bad actors 39:20to figure out ways to shut down conversation 39:23when it is maybe not just about harm, 39:25but maybe something that is politically unlikely 39:28or does not fit the notion of the narrative 39:30of your government or anything of that kind. 39:32So I would be really pretty concerned 39:33about being able to do that kind of thing 39:36under the narrative and the guise of AI welfare. 39:39So these are all things to consider. 39:42Um, funny blog posts from Anthropic. 39:44Look, this is from the people who bought you Claude's 39:46vending machine that orders tungsten cubes. 39:48Like you got to give them something. 39:50They shouldn't be surprised. 39:51Not surprised. 39:53I'm glad they're having fun with it. 39:54But when it comes to, you know, other 39:57how other people report on this kind of thing, 39:59when we get outside of the valley, 40:01outside of a few engineers who are having fun and trying things, 40:04I think you have to have a broader and more responsible 40:07look on what this kind of framing means, 40:10and whether this is really the best way that we should be disseminating, 40:14uh, what happened with, you know, a few engineers? 40:16Sandi. So a lot to unpack here. Um, 40:18I don't know if you would agree with Marina, uh, 40:22and Nathalie about this, uh, 40:24on, like, AI welfare being kind of like a sideshow. 40:28I actually do want to take a contrarian. Take that. 40:30Like, actually, we should take, AI, welfare seriously. 40:32I'm curious how you think about it. 40:34You know, I, I agree, so I'm not going to take the contrarian take, 40:37but I have some friends who have very different opinions than myself. 40:40Right. And some of them who are at Anthropic. 40:43And so I agree, I think it's a very tasteful way 40:46to allow Anthropic to essentially cover it's, 40:50you know, and have insurance and liability. Um, 40:54but some of my, you know, there's an overall 40:58take at Anthropic anyways, like they launched their, um, 41:01I think it's economic figures program in June 41:04where they're taking a look at just the economic 41:06welfare and social welfare 41:08and what the future looks like. 41:11And there is quite a large like at any company 41:16band of extremism 41:18in terms of who believes that AI will be sentient one day. Right. 41:22And so, um, they might just be like, hedging 41:26their bets, like insurance to where 41:28if they do believe that one day 41:30it shall become sentient, at least they're laying the foreground to like, hey, 41:34I was nice to you. 41:35You don't come from me. 41:40Well, maybe I'll throw out. 41:42I'm just curious about turning the crank one more time on on 41:45the kind of like pro argument on this side. 41:47One argument that I've heard, which I kind of like is, 41:50well, you know, we're not certain about the sentience 41:53of all sorts of living things, right? 41:55Like we believe in animal rights that may have varying kinds of sentience. 41:59And so, like, is it all that crazy to, like, 42:02take it seriously and they add context. 42:05Um, Marina, what do you think? 42:07It's certainly one way to to go about it. 42:10Uh, you could try to also say, what are you trying to do this for? 42:15Are you trying to actually cause harm? 42:17And, uh, what you think you're doing is torturing the 42:20AI, or what you think you're doing is you're trying to test it 42:22so that you can make sure that it does not cause some kind of harm. 42:26Intent probably has a long way to go here, but I don't know. 42:31I'm on the camp of people, as you know, 42:33that thinks we're still pretty far away from sentience. 42:36Uh, as far as that goes. 42:37And, I mean, maybe one day we will all welcome our AI overlords. 42:41We got some time. 42:43Um, and, you know, it's 42:46almost like the difference between black hat and white hat hacking. 42:48Maybe we can have the right intent on it, 42:50and then they won't get mad at us when they take over. 42:53I don't know, guys. 42:55Yeah, but it's just too far away. 42:57Like we're not even talking about anything. 42:58We got other problems, and you see it, right? 43:01You're just like, this is. It's just. 43:02It's just token prediction. 43:04Yeah. Well, great. 43:06Um, any final thoughts on on this one? 43:09Uh, I guess I don't know when 43:10maybe this is a good question to end on, I guess. 43:12You know, uh, I think, Marina, to your point. 43:15There's a lot of reporting around these kinds of stories. 43:19Um, a lot of ability to get confused on what's going on 43:23and a lot of hype around these kinds of stories. 43:26Um, do you have any advice for listeners 43:28who may hear these kinds of claims in the future? 43:31Like, see it all with a grain of salt. 43:33You know, how would you want people to receive these kinds of stories? 43:36Look to history. 43:38Um, there's very often nothing new under the sun. 43:40And very interesting technological innovations 43:43go through a period of excitement 43:45and, uh, in different directions 43:47before they settle into something that is useful. 43:49The perception that people who are in the thick of 43:51it have is not the perception that other people outside of the thick of 43:55it should have. 43:57And, uh, try to realize that 43:58as much as it's interesting what's going on right now. 44:01I don't think that from a social perspective, it's necessarily new. 44:04And I think I've spoken here before 44:07of the need for people to have a broad education, 44:10a wide understanding of what goes on besides just the technology. 44:13Also, why did this kind of technology get created by people 44:18at the time that it has in the society that we have, 44:21and just try to take it from that perspective 44:23as much as possible, and then try to get your news 44:25from multiple sources about it. 44:27Always a good one. 44:29Yeah, well, that's good advice. 44:31And that is all the time that we have for today. 44:34Uh, Nathalie, Marina, Sandi, thanks for joining me. And Sandi, 44:37hopefully we'll have you on in the show in the future. 44:39Thanks for having me. 44:40And thanks to all you listeners. 44:41If you enjoyed what you heard, you can get us on Apple Podcasts, 44:44Spotify and podcast platforms everywhere, 44:46and we'll catch you next week on a Mixture of Experts.