Learning Library

← Back to Library

Claude 4.0 Release Sparks Future Speculation

27m • Unknown Channel • ai-ml • news • intermediate • Watch on YouTube ↗

Key Points

The release cadence has slowed: Claude 3 → 3.5 took ~3 months, 3.5 → 4 took a year, and the panel predicts Claude 5 could arrive in a few months to a year.
Bryan Casey stepped in as interim host for a double‑episode of the “Mixture of Experts” podcast, featuring panelists Chris Hay, Marina Danilevsky, and Shobhit Varshney.
Claude 4.0 (including Sonnet and Opus models) was the week’s headline, highlighted by a humorous anecdote where Claude chose Radiohead as its “favorite band.”
Anthropic’s market advantage has historically rested on two pillars: a community that enjoys Claude’s personable interaction style and strong adoption of Claude for coding tasks.
The Claude 4 release underscored the coding focus, signaling that Anthropic is doubling down on developer‑oriented capabilities alongside its conversational appeal.

Sections

Full Transcript

# Claude 4.0 Release Sparks Future Speculation **Source:** [https://www.youtube.com/watch?v=Blq_pf6yd6U](https://www.youtube.com/watch?v=Blq_pf6yd6U) **Duration:** 00:27:35 ## Summary - The release cadence has slowed: Claude 3 → 3.5 took ~3 months, 3.5 → 4 took a year, and the panel predicts Claude 5 could arrive in a few months to a year. - Bryan Casey stepped in as interim host for a double‑episode of the “Mixture of Experts” podcast, featuring panelists Chris Hay, Marina Danilevsky, and Shobhit Varshney. - Claude 4.0 (including Sonnet and Opus models) was the week’s headline, highlighted by a humorous anecdote where Claude chose Radiohead as its “favorite band.” - Anthropic’s market advantage has historically rested on two pillars: a community that enjoys Claude’s personable interaction style and strong adoption of Claude for coding tasks. - The Claude 4 release underscored the coding focus, signaling that Anthropic is doubling down on developer‑oriented capabilities alongside its conversational appeal. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Blq_pf6yd6U&t=0s) **Predicting the Next Claude Release** - A panel of experts debates how long until Claude 5.0 arrives, reflecting on the recent Claude 4.0 launch and the industry's slowing cadence of major model upgrades. - [00:03:03](https://www.youtube.com/watch?v=Blq_pf6yd6U&t=183s) **Anthropic’s Coding‑Centric Claude Release** - The participants talk about Anthropic’s latest Claude model that heavily emphasizes coding, riff on a missed Radiohead joke, and share their preference for Claude 4 Sonnet over earlier versions. - [00:06:04](https://www.youtube.com/watch?v=Blq_pf6yd6U&t=364s) **Why Coding Suits LLMs** - The speaker argues that software development is a natural fit for large language models due to code’s structured nature, easy verification through compilation and execution, and well‑defined interaction patterns, contrasting it with less reliable tasks like summarizing call‑center recordings. - [00:09:11](https://www.youtube.com/watch?v=Blq_pf6yd6U&t=551s) **From IDE Assistants to Autonomous Agents** - The speaker explains how AI coding tools have shifted from simple IDE companions to sophisticated background agents that independently execute programming tasks and periodically update the user. - [00:12:28](https://www.youtube.com/watch?v=Blq_pf6yd6U&t=748s) **OpenAI's Mac vs PC Narrative** - The speaker links OpenAI’s acquisition of Windsurf and hiring of Apple‑style talent to a potential shift toward a more closed, vertically integrated ecosystem, evoking a “Mac versus PC” dynamic in the AI space. - [00:15:32](https://www.youtube.com/watch?v=Blq_pf6yd6U&t=932s) **AI Safety, Alignment, and Monitoring** - The speakers examine Anthropic's safety research—including constitutional classifiers and AI‑enforced content protections—while debating if AI should primarily guard against harmful outputs, raising concerns about privacy and “Big Brother” surveillance. - [00:18:37](https://www.youtube.com/watch?v=Blq_pf6yd6U&t=1117s) **AI Employee Whistleblower Concerns** - The speaker discusses how enterprises should treat AI systems as employees, including onboarding, verification, supervision, and the potential for AI whistleblowing and external reporting on alignment and safety issues. - [00:21:40](https://www.youtube.com/watch?v=Blq_pf6yd6U&t=1300s) **Safety Controls in Full‑Stack LLMs** - The speaker emphasizes the need for tighter safety metrics, observability, and human‑like behavior controls as companies transition from using external LLM providers (e.g., Anthropic, Llama) to building their own full‑stack language model solutions, noting industry shifts like the adoption of the MCP protocol and the retreat from consumer‑facing chatbots. - [00:24:44](https://www.youtube.com/watch?v=Blq_pf6yd6U&t=1484s) **Anthropic's Agentic Stack Strategy** - The speakers discuss how Anthropic is shifting toward an agentic stack emphasizing long‑term planning, memory, and latent‑space reasoning to target enterprise applications. ## Full Transcript

0:00Claude 3.0 came out a little over a year ago. 0:02It was probably a year and change at this point. 0:04The distance between 3 and 3.5 was about three months, and then it 0:08took a year to get from 3.5 to 4. 0:10And that's something that we've been seeing across the industry is 0:12that like the slowing down of some of like the big next generations. 0:16And so, as a hot take question, how long until we get to Claude 5.0? 0:20Shobhit, I'll start with you a few months. 0:23Okay. 0:23Uh, Marina, 0:26uh, a year, maybe a year 0:27and change. 0:27All right, Chris. Why haven't I got to already? Like you can, 0:30Claude 4 came out yesterday. 0:31I want my Claude 5 now. 0:33That's the right answer. 0:39Hello and welcome to Mixture of Experts. 0:41Uh, Tim has been given a much deserved break from podcast hosting, and 0:45so you're all stuck with me today. 0:46I'm Bryan Casey, your interim host for, uh, MOE 0:49and today we are doing, this week actually we're doing a double episode 0:53of MOE because an almost impossible number of things happened in, in 0:56the market This week we're gonna be talking mostly, um, about the Claude 1:004.0 release, but we might touch on some of the other kind of fun stories 1:03that are, uh, a bit related to this. 1:05We got a great panel today. 1:06We're joined by Chris Hay, CTO of customer transformation, Marina 1:10Danilevsky, Senior Research Scientist and Shobhit Varshney Head of Data, AI 1:15data and AI, uh, IBM Consulting Americas. 1:18So this one, the big news obviously we're talking about is Claude, um, 1:204.0 came out both Sonnet and Opus. 1:23Um, and a kind of funny story maybe to start things out, to 1:27kind of frame the release is, um, 1:30A few months back, you know, I was having a conversation with one of the 1:33previous generations of Claude, um, and one of the questions I was asking 1:36is like, what's your favorite band? 1:38Um, and it was like, I don't listen to music 1:41I'm an AI model. 1:42Um, like, but if you were somebody who listened to music, what would 1:46your band probably or favorite band? 1:48Because, and Claude was 1:49"Probably Radiohead would be my favorite band." 1:51Uh, and I was like, I thought that was kind of like really on the nose, 1:56and the reason why I'm bringing this up is that like for a long time it felt like 2:01the kind of wedge that Anthropic had in the market was these two main pillars. 2:05Um, one, there's a community of people who kind of liked the model and the app almost 2:09like for this purely vibes based sort of thing, like the personality of Claude. 2:13People just like liked it more and liked talking to Claude more than they like 2:16talking to just about any other model. 2:18Then you had this other use case, uh, which was like everybody who 2:23was, um, a lot of the people who like, loved using coding models, 2:27Claude was oftentimes like one of the most popular ones in that space 2:30and had these kind of two main pillars, um, associated with it. 2:33And one of the things I thought was sort of telling in this release 2:35is the degree of focus on coding in particular, like coding sort 2:40of dominated the entire release. 2:43And I was just juxtaposing that to like everything else going on 2:47in the industry this week, which it actually feels like the whole ar 2:50AI market's actually like expanding and like all the use cases, um, you 2:54see everything from I/O this week. 2:55You see all the multimodal stuff coming out. 2:58You see even like the news outta OpenAI with Jony Ive, so it's like, 3:01oh, we're gonna get into hardware and new devices and things like that. 3:03So the whole world's like expanding. 3:05And at that same moment, anthropics actually getting like more focused. 3:09Uh, and so like maybe Chris, I'll start and kick it over to you, 3:12um, as a way of getting into this. 3:14Like, does that surprise you at all? 3:16Like, does it make sense? 3:17It's kind of obvious given kind of where Anthropic is these days, but 3:19just like, what did you think about, like, what was kind of like almost a 3:22singular focus on coding in this release? 3:24Before I get onto that, I think you missed the setup of Claude's joke, and, 3:29and you completely missed it, Bryan. 3:30'cause it went, oh, come on 3:32What is your favorite band? 3:33And it said, Radiohead. 3:34And you were supposed to say, okay, computer, but you didn't, didn't you? 3:38Oh man. 3:40You missed 3:41Claude's setup. 3:42I mean, Claude, they, they're never gonna let me back on this show 3:45now that I've just like blown it. 3:47Um, Claude, so Claude is an expert in coding and standup comedy. 3:52That is, it's two things that it does now, a, as for coding, 3:56Claude is a fabulous coding model. 3:59I mean, the first thing to say is it pretty much powers anything, 4:02um, that involves coding. 4:04So if you think of things like Cursor, et cetera, right? 4:06Claude is the default model for that. 4:09I have switched from model to model, and I have to say I, I always go back to Claude. 4:14So even, even the uh, kind of E oh three models, I tend to, I used to 4:20tend to pick Claud 3.7 over it. 4:23Um, I am so glad Claude 4 Sonnet I is. 4:27Especially, I'm gonna skip Opus for a second, for very, very good reasons 4:32um, but Claude 4 Sonnet is a fabulous model. 4:35So you know what? 4:36They've got a niche. 4:37It is the best model at this 4:38and they're doubling down. 4:40They have sorted out some of the frustrations of the coding models. 4:44Um 4:45Oh my God, it was driving me insane. 4:47The amount of time is you say, okay, create me this piece of code, and 4:51then it's like, okay, here's this one change, and then you're like, 4:54okay, where's the rest of the code? 4:55You're like, ah, it's all the same. 4:57You can figure it out for yourself. 4:59You're like, no, you figure it out. 5:01You're the computer. 5:02I'm the lazy human. 5:03Give me the code that I want. 5:04I only want to hit copy and paste, and so that has been sorted out. 5:09The second thing that's sorted out is what's been driving me mental for 5:12the last few weeks is what I call 5:14Diffageddon. 5:15Whereas every piece of code I ask for, it goes, here's a diff file. 5:19And you're like, I, I'm not a canvas. 5:21You don't need to give me a diff file. 5:23You give me the code. 5:25Right. 5:25You know? 5:25And, and that has been sorted out. 5:28The, the Claude 4 Sonnet model really is just killing it. 5:31So I love the fact that Anthropic is doubled down on that. 5:35The only complaint I would say to come back to the Opus model is 5:39Oh my goodness. 5:40It eats your tokens, right? 5:42It's like, you know, put any prompt in there and then suddenly two minutes later, 5:46it's like, ha, you've used up your limits. 5:49Why don't you subscribe to Max? 5:51Or come back later? 5:52And you're like, huh. 5:53So I'm now kind of going, Opus is for 5:56architectural things, you know, make me think about things in a different way, 6:00but for default coding, Sonnet 4 6:03Here we go. 6:04I love it. 6:04So go Anthropic. 6:06You have the best coding model, in my opinion. 6:07That was a great intro like Shobhit I know before the 6:09show we were even talking about. 6:11There's just like an enormous amount of stuff going on in the 6:14coding space, uh, right now. 6:15So maybe you wanna take a minute, just talk about, like, put this in 6:18context a little bit and just say like, you know, obviously they're 6:20doing a lot in this space, but like, how does it fit into this kind of 6:23broader landscape of, you know, what's going on in the, the coding terrain? 6:27So I think coding, software development, coding general has been the killer use 6:31case for these large language models. 6:33And there are a few things going for it, right? 6:35There is a, there's a good structure to it. 6:37So you can train it on structured code and so forth. 6:40There is some sort of a verifiable reward that I can say that, hey, this 6:43code compiled and ran, actually did what it was asked to do and so forth. 6:47And then there's a very structured way of, of, uh, talking to these models, right? 6:51I can go give it a guitar, it'll do a PR on its own, and it'll go exclude stuff. 6:55And those things are very well defined and mature versus if I ask you to say 6:59even something simple as summarizing a call center recording into, into a 7:04paragraph, I can't trust it to always 7:07pull the root cause of the issue into the summary and don't have 7:10a good verifier for that either. 7:11Right. 7:12So if you look at the, the how far we'll come with say, 7:14reinforcement learning techniques where I need a verifiable output, 7:18we have done a lot better as a community on software development. 7:22If you look at the multi-agent space right now, 7:24I think the way human organizations are structured in our software development 7:29teams, there's a overarching PM that will go split it up into these tasks and 7:33create Jira tickets and whatnot for you. 7:36As you can pull and start working on it can check in. 7:38There's a way of verifying what you're doing and you can 7:40collaborate much, much better. 7:41There's systems in place for collaboration around, uh, software engineers, 7:46and I think we just leverage those tools to stand on those shoulders 7:48and say, now I can have LM agents start to communicate around these. 7:52But if you go to say a finance back office workflow, it is a complete 7:57mess of how people work with each other across systems and talk to each 8:00other and so on and so forth, right? 8:02So I think we have an unfair advantage in the software development area where it 8:05makes sense for people who are building the software for these AI models. 8:08So then leverage themselves and there's this good feedback loop. 8:11So you, I feel that software development will stay ahead in the users of 8:15generative AI and multi-agent just because of the, the nature of that. 8:18So within software development, when you look at all the different 8:21models, we've had massive improvements with Gemini 4.0, uh, to 2.5 Pro. 8:26We've had great, uh, models come out from, from OpenAI as well, and all 8:31of them are now starting to move away from auto completion last year to now 8:34doing the whole repository end to end. 8:36We can take the whole 10, 12 different files is five, figure out how to, how 8:41to connect the dots and what's, what is starting, spreading it and so on 8:44and so forth in one such scenario. 8:46Uh, one of the work I was doing, it ran out of, uh, its limits on one 8:50API call and I could see it come back and edit the plan and say, oh, 8:54let me try this other worker out. 8:56The fact that it is able to stop, rethink, come back and change the to-dos 9:01and go back and execute a workaround and stuff, that's just beautiful. 9:04When I start hiring people, interns, this summer, it's very difficult 9:08to, for me to figure out what work am I going to give them. 9:11It doesn't matter which, you know, Ivy League you went to, I'm gonna define 9:15what my work is supposed to be and I'm gonna give you some instructions. 9:17I'm gonna validate what you, uh, I might as well just have 9:20these models do it for me now. 9:21So I think it's a, with a very, very different world today with 9:23all the software development. 9:24I think 9:25I think that's a a really, it's, and it's also like a big theme and like, Marina, 9:29maybe I'll throw this one over to you, which is kind of where like the original 9:35like coding model I. Like, I think like product market fit started with is was 9:40just like as an assistant that's sat alongside in your IDE, um, essentially, 9:44and like that was like the main use case. 9:46And then, but what we're seeing more of is, especially as these things get better 9:51at reasoning, as they get more nines of reliability at they as they get better 9:55at like long context and longer running tasks, like this whole space of like 10:00background agents where it's like you just give somebody a, you know, give an agent 10:03a task and it just goes and does that, um, and comes back to you when it's done 10:08Um, essentially. I'm curious whether we will actually end up looking at 10:12like, the coding assistant as like a interesting blip in time where that 10:16was like the primary paradigm around how people were using, uh, models 10:20to generate code and like we're, 10:22you know, moments away from actually just background agents just writing all this 10:26stuff themselves and they just like check in with you periodically and you know, 10:30it won't be long until like the vast majority of code that's getting written 10:33in the world is being written that way. 10:34And I'm, I'm curious how you kind of see some of this stuff shaken out, 10:36Marina. 10:37I still think that there's a lot to be said for, uh, being able to phrase your 10:41problem to the model properly and you still need a decent amount of experience. 10:44So one thing I will say Shobhit is at the very least you could teach your interns 10:47how to ask the thing for the right 10:49problem because that itself is a learning experience. 10:51We've talked on the pod before about like education and things about students 10:55cheating and the rest of it, and that now you really do need a different kind 10:58of critical thinking, which is fine, 11:00this thing can write code for you, what did you ask it to do? 11:02Is thi, are these models also gonna be able to tell you 11:04that you're being an idiot? 11:05Are they gonna be able to tell you that you are writing code 11:07that is extremely inefficient. 11:09Why are you doing quotation product in your code? 11:11What is this with the SQL query that you asked me to do? 11:13Or anything of that kind? 11:15And but until we get to there, we're really not gonna be anywhere near, 11:20uh, what you're saying, which is go ahead, run in the background. 11:22I'm fine. 11:23Move it along so that I, I I'll say we still got some time because people 11:27aren't that great at asking for what you need, unless you already have spent 20 11:31years being a really awesome engineer. 11:33And then, yeah. 11:34Great. 11:34This is helpful for you. 11:35Not if you're new, not if you're outside The Silicon Valley bubble. 11:38I, I have definitely seen, um, in my own experience, just like 11:43even just like telling the thing how to structure 11:45like, I want these functions to do these things. 11:47Like somebody who doesn't know anything about the space is gonna have 11:49like a really hard time with that. 11:50So, um, 11:51so a good one of this is exactly where the interns are going. 11:54Marina, I fully agree with you. 11:56Um, I had somebody who was white coding and said, Hey, check out my, my demo that 12:00I built and sent me a local host, Colin. 12:02Colin had no clue what this means, right? 12:05Like, you don't realize that the local host is of your 12:08right. 12:11Yeah, I was like, you realize I can't, I can't get to your local host. 12:14It's just, uh, 12:15maybe as like a final question in, in this space, and it's, um, one of the 12:22things I thought was kind of interesting and it's like, oh, I'm, it's a little 12:25bit grasping at straws, but like it all happened at the same time, so I'm 12:27gonna make the connection anyways. 12:29Um, it was a little bit telling to me that like the only code editor, 12:33uh, prominent code editor where 12:36Claude 4 was not available 12:37like on Day Zero was Windsurf, um, which was recently acquired by OpenAI. 12:43Um, and this same week was the same week that we saw, like the huge announcement 12:48around like OpenAI and Jony Ives company and $6.5 billion, uh, transaction there, 12:55and it was just making me think about a little bit. 12:59The kind of Mac versus PC vibes. 13:00Uh, right. 13:01And it's like almost like hard to ignore those vibes when, you know, you're 13:05literally bringing in the people who are famous for building up some of like that 13:08Mac ecosystem, like onto the open AI team in a moment where they're going kind of 13:13more vertical and it's like you're just getting like a touch of walled gardens. 13:18I'm showing up now. 13:18It's like, it's not that far. 13:20It doesn't like, you know. 13:22I don't think big trends have kind of landed in this space yet, but it's like 13:25you're seeing kind of hints of it and like maybe Chris will throw it over to 13:28you is like, do you see, do you see any world where like this becomes like more 13:34kind of Mac versus P like PC and like these labs start to go kind of more 13:38vertical into the the software space. 13:40It's like not clear to me how that could actually like play out 13:42from a technology perspective. 13:44The same sort of way like, you know. 13:46Like the PC era did, but I don't know. 13:48Are you getting, I'm curious if you're getting some of these vibes. 13:50Oh, I, I'm getting the 1984 vibes going on. 13:53I'm thinking here's, here's OpenAI there, or whatever, and then it's like 13:58Big brother is watching you and then you get the runner coming in and the big. 14:02Gong open source bangs of gong and or whatever, and then we're like, yeah, 14:07we've broken down the walled closed source models and the open source models 14:12are here, and a big herd of llamas are gonna come trodden by everybody 14:16and stomp on all their closed models. 14:18That, that's where we're getting to. 14:19Yeah. No, I'm, I'm all for the fighting. 14:23I am, you know, let the games commence as far as I'm concerned. 14:27No, absolutely 14:28I think anywhere where you have 14:30big players that are sort of playing against each other in that sense, you 14:34know, we're gonna get into that world. 14:36Everybody's playing for the same ecosystem at the moment. 14:39They want to, they want domination of the world and, uh, that, that's a bad phrase, 14:43they want, they want to be, uh, the best in their space. 14:47So I think we're gonna see that play out for a while because the prize 14:51is, is so big at the moment, right? 14:53So whoever has the best AI is, is gonna make an awful lot of money. 14:57So they're gonna go for it. 14:59You can't, you can't put that, the models want world domination into the training 15:03set, Chris, because then you know what's gonna happen as a result, um, from that. 15:07So, um, maybe to, maybe to switch gears just a little bit, um, here. 15:13So I think that was a really good discussion on, on coding. 15:16I wanna talk on just like one other aspect, um, of, of the release, which was 15:22you know, Anthropic is kind of known for two things. 15:24Um, one of them is coding at this point 15:27I'll, I'll still say they're a little bit known for vibes, uh, 15:29even if they're leaning into that a little bit less, um, these days. 15:32The other thing that I think they drive a lot of interesting research, 15:36um, around a lot of interesting discussion in the market around is 15:39just like all the work they do around safety and alignment, and they 15:42like they'd put together a lot of materials associated 15:44with, with this release. 15:46Um, you know, one of the ones that I know that was in their papers and 15:49they were talking about on some of the podcasts is just like some of the work 15:51they're doing around like constitutional classifiers and then just like the work 15:54they're doing to have like AI, um, 15:58basically monitor and enforce like certain sort of protections, um, and around 16:03different types of responses, particularly around notably harmful, um, things. 16:08And so like Marina, maybe I'll just like turn it over to you to talk a 16:11little bit about kind of, you know, what anything that was kind of striking 16:14to you about some of the work that they were, um, doing into this space. 16:17And then maybe by extension, like, do you ultimately see like AI being kind of 16:22the primary way that we protect against the harmfulness of AI, at least when 16:25it comes to just like managing output? 16:27Now, God talk about Big Brother vibes. 16:29Um, I'm like, yeah, you're talking to the AI and you never know if you're gonna get 16:33reported for what you just said and did. 16:35That's a little bit challenging. 16:37Um, right, 16:39so Anthropic does have a good, uh, at least, certainly very much 16:43good intentions in this direction. 16:44They put out interesting things and they know how to have a 16:46little bit of clickbait articles 16:47like one weird trick to make AI blackmail you if you won't do what you want. 16:52That that was a fun one that just came out, right? 16:55I mean, they did tell it and see how well 16:57that it listened to instructions to be blackmailing, but it 17:00was fun, is that it did. 17:01Um, and so I think that here we have to continue to pay a lot of attention 17:06uh, speaking a world domination of, again, where does all this data go? 17:10Who has access to it? 17:12If governments wanna see your data, then what, which ones, uh, regulations 17:17are gonna be behind for a while? 17:18Talk about Mac versus PC. 17:20In my head, I'm like, okay, so when are the antitrust lawsuits gonna come up? 17:24Who's gonna be arguing them in what court? 17:26Because there's a real question, like you said, a market share now 17:29of the market is the entire world. 17:31So who are we going to be fighting with here? 17:33So the whole safety thing here has a, a number of levels that we're still only 17:38barely starting to scratch the surface. 17:40I, I like Anthropic at least continuing to try to put some 17:43interesting things out and that's really positive, but don't 17:47maybe get completely distracted by the, the fun little anecdotes. 17:51Think about what it's, uh, going on under the hood 17:53and again, who people are and are not teaming up with is gonna be at the end 17:58more important, especially economically and legally than, uh, here's a 18:02particular research paper that I put out, 18:04it hurts me as a researcher, but it's true. 18:07That's what's really gonna matter. 18:08Shobhit maybe I'll turn it over to you as well. 18:10Like, Marina, Marina touched on this, um 18:12uh, like can actually an adjacent example a little bit, but like, obviously one 18:15of the things that they've spent a lot of time on is investing in coding and 18:19tool use and, uh, things like that, 18:20and they had some scenarios where they were giving Claude, unfettered 18:24access to, to tools, and then, 18:27kind of gave it examples of egregious wrongdoing where it would actually go 18:30in then proactively like alert the press at authorities and, uh, things like 18:34that, which is like, I had some pretty conflicted feelings, uh, about that. 18:37But like, one of the other things I was just thinking about as part of 18:40that is like, you know, how would a company feel if like, you know, it's 18:44modeled, decides like, I don't like you asking these questions, I'm gonna 18:46go notify like, um, you know, external authorities about some of this 18:51and, you know, I'm, I'm curious about just your general reactions to some of the, the 18:54alignment research and safety research 18:56they're doing. 18:56And then just like how you think, like what if any kind of implications you see 19:00in terms of like how enterprises might kind of free some of those interactions? 19:04So I think we, like, we as humanity need to pick a lane. 19:08If you try to say that we want AI employees to behave more and more 19:11like, uh, like our own employees, 19:14then we should be ready for whistleblowers as well. 19:16Right. 19:17I'm just making the point that we are trying to make sure that we have good, 19:20trusted employees that where, you know, with who are onboarded, with verified 19:24skills, we know which college you went to, 19:26there are training that you go through to become an IBMer, 19:29there is good, uh, like good 19:32measures, ways of tracking what you're doing. 19:35We give you access to only the tools that we need you to, and eventually there's 19:39a supervisor was to approve promotions or whatever, any big decision, right? 19:43Those things apply to an AI employee as well. 19:45We want verified where, what training material, when do your 19:48models, like grant models are. 19:50We wanna make sure that you go through a onboarding process to, 19:53with Granite, uh, with our InstructLab to make sure that you models 19:57understand our way of doing things. 19:59You have conference evaluation metrics with governance, you have policies that 20:03we have to abide by access control and eventually human in the loop, right? 20:07So I think if you're looking at those human employee and AI employee, 20:10I think they'll start merging. 20:11We will come to a scenario where as agents start to cross boundaries and talk 20:17to other agents in other silos, right? 20:19You'll have a co-pilot talking to service to SAP or to Salesforce. 20:24You'll have people, 20:25agents talking to each other as well. 20:27The constraints, the, the kind of governance that's needed to make 20:31sure that you are not leaking any internal information, you're not 20:35sharing stuff with authorities still of that nature are very real. 20:38So as a community, we need to get to the point where we 20:40evolve agent ops governance. 20:43In this new world where you have brilliant, uh, people, LLMs working 20:48across organization, you wanna make sure that they're not exchanging information 20:51that you don't want them to as well. 20:53But I think generally, uh. 20:55Anthropic has done a phenomenal job of trying to balance this speed 20:59to market versus safety as well. 21:01It shows they're more transparent in, in how it's thinking through it and you know, 21:05give a very detailed view of the safety checks and balance that they've put. 21:08So I'm all in on the Anthropic camp that the future has to be 21:12a little bit more transparent. 21:14You have to ensure the right safety mechanisms, but I'm not 21:17seeing this from an enterprise lens yet. As an enterprise user, 21:22I don't have control over what kind of a model training or, or rules they're 21:26setting for safety internally right now. 21:29All of that is baked in, in the regional training process, right? 21:32Me as an enterprise, I may wanted to do certain things a little bit differently, 21:36in which case I may onboard an employee and tell them, here's how we're 21:39gonna do things in our organization. 21:40Right? 21:40Different from what you were originally trained, or your previous owner 21:43or your previous, uh, company that you worked for, uh, used it, right? 21:47So I think we need more control 21:49over the safety metrics so I can relax and make them more constrained as I need, 21:53that is missing today. 21:54I think that thought's actually a interesting place to close, 21:57which is that if our goal is to make something that mimics. 22:02In some ways, like the human brain and the way that human beings behave and 22:05think and reason, um, we should also expect them to do things that human 22:09beings do, which is sometimes not always what we, what we want them to do. 22:13And so, um, you know it, but like, you know, I, I think having kind 22:17of the right sort of controls and visibility and observability around 22:20that is like obviously gonna be huge. 22:22Go ahead. Bryan, 22:23there's a couple things before, before we do, uh, start wrapping up. 22:26I think this was a big statement from moving from an LLM provider 22:31for Anthropic to becoming a full stack. 22:32We saw that with Llama, they create the full stack around it, Anthropic spent 22:36inordinate amount of time explaining all the different components around it. 22:41The MCP protocol, uh, uh, has been winning the battle with all of their competitors, 22:46Google and Microsoft and, and everybody else is supporting MCP protocols now. 22:50Right? 22:51So I think there's a lot that they're doing in, in growing 22:53from a company that does. 22:55You also saw that they pulled back from the customer facing chat bots. 23:00They practically have not invested much since December of 23:03last year on their chatbots. 23:04So they've just come completely given up on that and say people are going to go to, 23:09uh, open AI and Gemini, and I know Chris, you'll make fun of me using Gemini again. 23:15But I feel that it's, it's doing a really, really good job, 23:17especially out of Google I/O. 23:18Right. 23:19So we are getting to a point where 23:20the focus of the company is changing from a model provider and a chat 23:24bot to now being the full stack, 23:26a lot of focus on safety and coding. 23:28We have not seen them spend that much effort on multimodal 23:32the way Gemini has, right? 23:33So there are certain areas where they're doing really well, certain not. 23:36If you look at the amount of time that AI models, the kind of workforce 23:39that they're doing on one axis, you have the complexity of the task. 23:42On the other axis, you have how long you can go and maintain recurrence 23:46when you're doing the task, right? 23:47I think from 30 minutes of what OpenAI could do, now we are at seven hours of, of 23:53work that, uh, Anthropic model can do. 23:55That's a step change in what you can do when you're trying to connect 23:58all the dots and the complexity. 24:00So I think there's a massive improvement that Anthropic has done. 24:02It's a very, consequential release for the, for them as a company, I believe. 24:06Yeah, I was just gonna add to that, uh, Shobhit and I think you're spot on there. 24:09One of, one of the biggest things, uh, they've really done there is 24:13focused on planning and memory and sorting out some of the context issues. 24:17I think that's absolutely huge because if you're gonna wanna run 24:19those long running agent tasks. 24:21Then it can't get confused halfway through. 24:24And some of the examples that they've got were, I think they were kinda, again, 24:28this week has been the Pokemon Wars. 24:29If these companies are not talking to each other, I'd be surprised 24:32'cause suddenly everybody's playing Pokemon with their models at this point. 24:35Yeah, 24:35it's like, you know, don't talk about models talking to each other. 24:40You know, let's get the employees stop talking to each other, right? 24:42'cause they're all trying to out battle each other. 24:44Um, but, but actually the ability to play games in a long running 24:49way and be able to plan and manage that becomes really important. 24:52So I think Anthropic is 24:55uh, one trying to hit themselves towards the, kinda the, the 24:59vertical stack on coding, but actually I think they're setting 25:02themselves up as the agentic stack, 25:04so I said agents again. 25:05Um, and I think that's important to Shobhit's point. 25:08They've set up MCP, they're focusing on planning, they're focusing on 25:12long running tasks, et cetera. 25:14Um. 25:15You know, they brought tools in there, they've enhanced 25:18their memory side of things. 25:19And then the other thing that they talked about, which was really a kind of short 25:22piece and it's really important, is the, the reasoning elements of the models, 25:27they are now proving that they're able to do that in the latent space 25:31and not necessarily requiring the tokens to do the thought there. 25:35And I think therefore, 25:37they are making a different play. 25:39They are the more technical focused, the more safety focused, but uh, and, 25:44and more transparent in that sense. 25:46So I think Agentic Stack is probably a big place that they're going towards. 25:52Well, so 25:52since Shobhit and Chris both decided the podcast wasn't over, 25:56Marina do you want to give the final, final word, um, on this one? 26:00I'll, uh, I'll build a little on what Chris was saying, which is that if they're 26:04gonna continue to push into the enterprise space and anyone is pushing the enterprise 26:07space, this work on planning and memory and, uh, long-term, being able to pick 26:12off where you left off and things like that, that is so important because, well, 26:16the type of tools we use as employees in enterprise space is collaboration. 26:20You need to be able to do things in real time, 26:22do things over different granularities of time, like what did we do today? 26:26What did we as a team do last week? 26:28So yeah, the Pokemon thing's funny, 26:29but you do see a say like learning, oh, here's a strategy, 26:33I'm gonna write it down, I'm gonna remember it for whatever 26:35remembering can mean in reality. 26:37Yes, it's a text file, but like we can go a lot deeper actually in how you use it. 26:41And so as we start to hopefully move away from just talking about these 26:46individual models and talk about more about the applications in which 26:50they're used, the systems in which they're used, this is gonna become 26:53more and more, uh, of a real thing. 26:55And right now, Anthropics, I think showing themselves 26:57this thinking about the right 26:58parts of this task. 27:00Alright, now we are gonna call it. 27:02So, uh, I think that was a great, a great place to, to lead on or to leave on. 27:06So Marina, Chris, Shobhit thank you for joining us, um, today. 27:12Another great episode. 27:13Um, hopefully next week will be even more eventful 'cause we didn't 27:16get enough, uh, news this week. 27:17And we'll do three podcasts, uh, but uh, appreciate everyone joining 27:21us today and we will see you next time on Mixture of Experts. 27:31Are, aren't we getting Claude 5 next week? 27:33Uh, 27:35supposedly yes. Chris said yes.