Learning Library

← Back to Library

Nvidia's Future Amid AI Chip Rivalry

Key Points

  • Experts predict NVIDIA will remain among the top five AI hardware leaders in five years, though the market will become more fragmented with new chip architectures and emerging neuromorphic designs.
  • AWS’s reInvent conference was highlighted as the year’s premier AI event, showcasing Amazon’s aggressive push into AI infrastructure, including the upcoming launch of its Trainium 3 AI accelerator.
  • Amazon is positioning itself to dominate the AI stack, building supercomputers for partners like Anthropic that are reportedly five times more powerful than existing deployments.
  • The “Mixture of Experts” podcast frames these developments within broader industry trends, emphasizing rapid innovation, competitive chip advancements, and the evolving landscape of AI hardware.

Sections

Full Transcript

# Nvidia's Future Amid AI Chip Rivalry **Source:** [https://www.youtube.com/watch?v=ZEwJfi7xPxc](https://www.youtube.com/watch?v=ZEwJfi7xPxc) **Duration:** 00:37:49 ## Summary - Experts predict NVIDIA will remain among the top five AI hardware leaders in five years, though the market will become more fragmented with new chip architectures and emerging neuromorphic designs. - AWS’s reInvent conference was highlighted as the year’s premier AI event, showcasing Amazon’s aggressive push into AI infrastructure, including the upcoming launch of its Trainium 3 AI accelerator. - Amazon is positioning itself to dominate the AI stack, building supercomputers for partners like Anthropic that are reportedly five times more powerful than existing deployments. - The “Mixture of Experts” podcast frames these developments within broader industry trends, emphasizing rapid innovation, competitive chip advancements, and the evolving landscape of AI hardware. ## Sections - [00:00:00](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=0s) **NVIDIA's Future in AI Hardware** - Experts debate whether NVIDIA will remain a top AI hardware player over the next five years amid a fragmented, evolving chip landscape. - [00:03:03](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=183s) **AWS's Expansive AI Ecosystem** - The speaker highlights AWS’s comprehensive AI portfolio, collaborations, and proprietary developments, while cautiously questioning whether the hype will translate into market dominance. - [00:06:06](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=366s) **Edge AI Chip Market Outlook** - The speaker argues that the dominance of edge inference chips in agriculture will depend on business cases driven by connectivity hotspots and data availability, echoing Ben Thompson’s claim that future value will lie in infrastructure rather than constantly updated AI models. - [00:09:12](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=552s) **AWS Emphasizes In‑House AI** - The speaker describes how AWS’s recent re:Invent highlighted its AI platform, urging customers to build scalable, secure solutions themselves rather than rely on external APIs, and offering hands‑on support through demos and GitHub resources. - [00:12:17](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=737s) **AWS Leverages Insight & Apple Partnership** - The speaker explains how AWS’s deep visibility into customer workloads and its high‑profile alliance with Apple provide an unfair advantage to anticipate, validate, and dominate emerging AI and financial‑service workloads. - [00:15:21](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=921s) **Exploiting Guardrail Timing Vulnerabilities** - The speaker explains how attackers leverage the brief millisecond gap between a model's output generation and its safety guardrails—especially through asynchronous request race conditions—to leak harmful content, urging a redesign of architecture and guardrail placement. - [00:18:24](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=1104s) **Balancing Real‑Time Delivery and Safety** - The speaker compares broadcast TV’s built‑in delay for content moderation to emerging safeguards such as prompt caching and AWS automatic reasoning, arguing that newer tools give a better chance to detect and prevent harmful LLM behavior. - [00:21:30](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=1290s) **Auditable AI Agent Marketplace** - The speaker outlines challenges of ensuring deterministic, bias‑controlled AI agents for legal compliance and proposes a marketplace of certified, task‑specific agents, similar to RPA bots and freelance services, with ratings and guardrails. - [00:24:38](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=1478s) **Meta-Prompting, Model Security, and Theory of Mind** - The speaker discusses applying social‑engineering and security concepts—such as metaprompting, flow‑breaking attacks, and model safety—to develop LLMs with a rudimentary theory of mind and more human‑like rationalization. - [00:27:43](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=1663s) **AI Multi-Agent Governance & Name Censorship** - The speaker predicts that future AI multi‑agent frameworks will mimic human organizational structures—including legal‑interpretation roles that create “good cop/bad cop” dynamics—and then highlights a recent case where OpenAI seemingly refuses to discuss certain individuals, likely due to a defamation‑filtering mechanism. - [00:30:46](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=1846s) **Challenges of Right‑to‑Be‑Forgotten in AI** - The speakers discuss how legal deletion rights force AI developers to rely on hard‑coded, over‑broad filtering patches because pretrained models weren’t designed to accommodate data removal, making compliance costly and imperfect. - [00:33:53](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=2033s) **Personalized AI Policies and Live Event Controls** - The speakers discuss future centralized yet personalized AI policy recommendations, using blocklists for real‑time event troubleshooting, illustrated by a quirky example of a tennis player named Sock generating unrelated content. - [00:36:57](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=2217s) **Clarifying the Unexplained Restriction** - The hosts debate the opaque rule that “you can’t talk about people,” propose adding an explanatory message to aid the ecosystem, and wrap up the episode with thanks and a plug for the podcast. ## Full Transcript
0:00Five years from now, is NVIDIA still 0:01the biggest name in AI hardware? 0:03Aaron Baughman is an IBM 0:04Fellow and master inventor. 0:05Welcome back to the show, Aaron. 0:07What do you think? 0:07So I do think that they're 0:08going to be in the top five. 0:09Um, the field's going to be much more 0:11fragmented with different chip architectures, 0:13but I'm looking forward to see what types 0:15of neuromorphic chips are going to come out. 0:16Vagner Santana is a staff research scientist, 0:19master inventor on the responsible tech team. 0:21Vagner, welcome back. 0:22Your predictions, please. 0:24Uh, I, I second, uh, Aaron. 0:26I think that I. 0:27Uh, NVIDIA will be still the top, but with 0:30different architectures and maybe cooler 0:32ideas on new architectures for chips. 0:34Yeah, I hope so. 0:35Uh, Shobhit Varshney, Senior Partner Consulting 0:37on AI for US, Canada, and Latin America. 0:40Shobhit, tell us what you think. 0:42I think NVIDIA, in terms of AI 0:44systems, beyond just the chip, 0:46there's a lot that goes around it. 0:47I think it'll be a force to reckon 0:49with for the next five years. 0:50And I would say this should 0:51be the top two or three. 0:52All that and more on today's Mixture of Experts. 1:01I'm Tim Hwang and welcome to Mixture of Experts. 1:03Each week, MoE brings you the analysis, 1:05hot takes, and banter that you need 1:07to keep up with the ever hectic 1:08world of artificial intelligence. 1:10We've got another packed 1:11schedule for today's episode. 1:12We're going to talk about a new jail brick 1:14that's hitting the scene, people you can't 1:16talk about on ChatGPT, but first we wanted 1:19to take as our top story, the AWS reinvent 1:22conference, which is happening this week. 1:24So for those of you who may be less familiar, 1:26this is the annual conference for Amazon's AWS. 1:30And there's been a host of big announcements. 1:32Uh, coming out of Amazon this week, um, not 1:34least of which is that they are announcing 1:36that their new generation of their sort of AI 1:38chip, what they call Trainium, uh, Trainium 1:41three is going to be launching very, very soon. 1:44Um, and, uh, there's a lot to get 1:46into, but I think Shobhit, I wanted to 1:47kind of throw it to you first, cause 1:48you were actually at the conference. 1:50Um, I just talked a little bit about Trainium, 1:52but curious if there's like, You know, 1:53what are the trends that you're seeing? 1:54What are other big announcements 1:55our listeners should know about? 1:56From my vantage point, AWS reInvent 1:59was the AI event of the year. 2:02That's a pretty bold statement. 2:03I mean, there's been a lot 2:04of big AI events this year. 2:06In terms of what they're trying to do to 2:08change the industry and absolutely dominate 2:10in the AI space is just absolutely incredible. 2:13If you look at all the different stacks 2:14or the layers of the stack at the compute 2:17level, they are doing a lot in terms of 2:19the chips and so are other competitors 2:21as well, but they are quite ahead. 2:23When you have somebody like Anthropic 2:25and you are building a supercomputer 2:26for them, that's five x more powerful 2:28than what Anthropic has today. 2:29That is making a bold statement by, as 2:32a company, uh, Amazon has to do a lot 2:35of ai and they may have a really good 2:37history of doing that for 10, 15 years. 2:39So they're building on, on top of that. 2:41Computers gonna be very critical for them. 2:42Nice. ROI second on top. 2:44All the storage, the amount of options 2:46you get as a developer is just. 2:48Incredible. 2:49It's like it's a dream for us 2:50to go build for our clients. 2:52It's one of our largest partners globally, AWS. 2:55So we do a lot of work in building large systems 2:58with all the different options with them. 3:00Then there's the AI layer for all the models. 3:03I think across the board, they've 3:05been very clear on choices. 3:07If there's one word that summarizes 3:09AWS today, it's ecosystem. 3:11They're trying to do their best to make sure 3:13you have the best in class models available, 3:15the best class apps and things of that nature. 3:17But then, oh by the way, we also have our 3:19own version that is delivering higher ROI. 3:22We are matching our exceeding. 3:23We have a massive announcement with NVIDIA. 3:25And oh by the way, we have our own chips. 3:27We have Great collaboration and investment 3:29in Entropic and all these other models. 3:31By the way, we also have our own. 3:32So I think the choice is why people 3:34will come to AWS and reinvent the 3:37kind of announcements they have made. 3:38We spent the last three days hands 3:41on working with the product leads. 3:43Being such a big partner of AWS, we get 3:45some dedicated talent from AWS to give 3:47us previews and give hands on experiences 3:49on how this actually is working. 3:51They've done an incredibly good job. 3:53Like, I'm so, so excited about 3:55the next few months going 3:56and doing this with our clients. 3:58Yeah, Aaron, so do you buy the hype? 4:00I mean, should everybody else 4:00in the AI space be scared? 4:02Traditionally, right, like most of the 4:03attention has been OpenAI and Anthropic 4:06and the people who are doing the models. 4:08Um, I guess Shobhit is kind of making the claim 4:09here that, you know, it's like maybe Amazon's 4:11going to kind of take the cake in the end, 4:13but I don't know if you buy that argument. 4:14Yeah, I mean, the way I look at 4:15it is that it's sort of a quasi 4:17contest to beat out NVIDIA, right? 4:19Where, um, You know, they're, they're trying 4:21to build, you know, their own ships to compete. 4:24However, if, if you looked at the announcement 4:26that AWS is still hedging, right, um, by a 4:30partnering with NVIDIA, right, on P6, right? 4:33So, so even though they're building their 4:34own Trainium chips, they're still going to be 4:36working with them on P6, you know, um, And, 4:40and because of that, you know, they're looking 4:41to see which way the tide's going to go. 4:44And then I also, I view this as, you know, 4:46they're looking, AWS is looking to reduce their 4:48dependence on third party chips to enhance 4:50their performance on AI workloads on AWS. 4:54But to me, right, um, there's still a lot of 4:56work that AWS has to do on the software stack. 4:59Um, and they still have 5:00to prove out performance. 5:02You know, if we think back, 5:04um, NVIDIA uses CUDA, right? 5:05That's the most widely adopted 5:07platform of AI workloads in the world. 5:09And it's supported by PyTorch and TensorFlow. 5:12Now, Trainium, it uses AWS's Neuron SDK, 5:15right, which has a fraction of the market 5:17share, and it's not as proven as CUDA. 5:19So, yes, I think that the chip hardware itself 5:22with Trainium is great, but AWS has work to 5:26do to build the consumer and developer trust, 5:29right, to be really, really competitive, 5:30and that's why I think AWS is hedging 5:33by still partnering with NVIDIA with P6. 5:36Right, yeah, it feels like kind of like we're 5:37kind of in this really interesting world 5:39where, I mean, all the big cloud providers. 5:41are kind of working on 5:42their own chips right now. 5:44And they're also all working with NVIDIA. 5:47Uh, and I think it's kind of 5:47everybody's hedging a little bit. 5:49I guess, Vagner, maybe this 5:50goes to your prediction. 5:51You were saying kind of in the future, maybe 5:53we'll just have a more diversity of chips. 5:54And actually, that will 5:55be the really good thing. 5:56Do you have a prediction on kind of like 5:57how the market's going to divide, right? 5:59Like, will it be just like NVIDIA 6:00for pre training or, you know, 6:02these types of chips for inference? 6:04Or I'm just kind of curious about how you 6:05think that market's going to divide out. 6:06I think that it will be 6:08based on a business case. 6:10Um, thinking back when I was, um, involved 6:13with digital agriculture, when you see, uh, uh, 6:16places that you have no connectivity and then 6:18you start thinking, okay, if we have to have 6:21like chips with inference, uh, running, uh, at 6:24the edge, then That will be the chip that will 6:27dominate the market if you have, let's say, uh, 6:30machinery for agriculture using those chips. 6:32And if you have access to data, if you 6:34are, let's say, in the, um, next to the 6:37place where they gather information, then 6:39probably you have something for do the 6:40pre training and training and that will 6:42be, uh, and then you have enough power 6:44connectivity in certain places in a huge farm. 6:47So I think that it will be based on, 6:49on business case, uh, uh, and how. 6:52Connectivity and data arrives, uh, 6:55uh, at these specific hotspots. 6:58One of the really interesting comments, 6:59uh, was from Ben Thompson, who writes 7:01a newsletter called Stratechery. 7:03It's like very, very good. 7:04Um, you know, one of the ways he sort 7:05of framed up a lot of the announcements 7:07was Amazon's basically making the bet. 7:10That, um, models won't matter 7:12so much in the future, right? 7:13That essentially it'll be sort of like 7:15infrastructure that runs the day and 7:16models will be like widely commodified. 7:18And so, kind of like what we thought 7:20was so special, which is, Oh my God, 7:21you have to get the latest model that's 7:23been released by OpenAI, is just going 7:25to be less of a thing in the future. 7:27Do you guys buy that? 7:28Like, do you feel like, you know, what 7:29we're seeing now is a movement of momentum 7:31towards the infrastructure providers 7:33versus kind of like the model creators? 7:35Tim, I think we're making really good progress 7:37as a community across each one of those. 7:39Thanks. 7:39Uh, we do need better intelligent models 7:41for reasoning and things of that nature. 7:43We're making some incredible 7:44strides in that space. 7:45That'll continue. 7:47Uh, there's a lot that happens 7:48before and after, uh, an LLM call. 7:51AWS has done an incredible job 7:53with their SageMaker stack. 7:55All the kind of, uh, automatic reasoning checks, 7:57the kind of things around, how do I go to, go 8:00pull structured data as part of my LLM calls. 8:03Enhancements to do all kinds of things that 8:05we as developers need when you go and build 8:07these for the clients in the last two years, 8:10you've done a lot of custom work in the 8:13middleware to make these elements work well. 8:15And now you're seeing each one of 8:17those providers catch up with giving 8:19you a full ecosystem end to end 8:20because they're also learning from 8:22how enterprises are deploying these. 8:24So I think AWS as a, as a, as a community 8:27is further ahead than some of their 8:29peers and giving you the full spectrum 8:32end to end and making it super easy 8:33for startups to come and go do this. 8:35I always have an enterprise 8:37mentality around these things. 8:39They are doing an incredible job on grounding 8:42on making sure there's right governance. 8:44Massive ecosystem, you can bring your own 8:46favorite eval to the framework and whatnot. 8:49They're very, very well placed. 8:51Models are going to get better. 8:52There's going to be a constant battle on that. 8:54But over 8:55time, it becomes a commodity. 8:56That's for sure. 8:57So, I guess, Shobhit, what's your prediction, 8:59I guess, for Amazon in the new year? 9:00Like, this is, you said, this is 9:01the biggest wave of announcements. 9:03Like, where does it look like when 9:04we're at reInvent, you know, 2025? 9:07They have clearly made a very massive dent in 9:10the AI community in the last two or three days. 9:12Right. They, if you roll back two years, 9:142022, when, when we were here, they had 9:17just made a bunch of AI announcements. 9:19And then two days later we 9:21had Chakchipiti come out. 9:22So they've got caught off 9:23guard at the re invent. 9:25Last year re invent was more around, yes, 9:28sure, I'm going to bring the big dogs on stage. 9:30I'll have Anthropic on during 9:31the keynote NVIDIA and whatnot. 9:33Right. So they said, yeah, we 9:34also have a lot of options. 9:35This one, they just came in dominating this. 9:39It took 24 months, but now they're killing it. 9:42I was like, guys, we got this. 9:43So I think the overall, the overall 9:45messaging was, where else will you go to 9:48do secure, scalable, end to end, infinitely 9:51scalable, pluggable ecosystem, plain AI? 9:54Don't outsource it to an API call. 9:56Come build with us, and here's how simple it is. 9:59Just a very subtle cultural thing. 10:02There's every key, every session, technical 10:04session I've gone to, They end with the same 10:07kind of, uh, of, of enthusiasm, it says it 10:10always ends well, what will you build next? 10:13Right. They have a huge emphasis in every session. 10:16They excite you about the possibility. 10:18They give you a couple examples 10:19of what clients are already doing. 10:21And they say, what are you going to build next? 10:23And the product managers 10:24are going to hang around. 10:25They'll show you how this works. 10:26Whenever I have a conversation at AWS 10:28with any of their folks, it typically 10:31starts with, hey, let me point you to 10:33a GitHub repo that does this for you. 10:35And then I'll show you this in action. 10:36But the first intention is go build epic 10:39stuff with this man, go get up and we'll 10:41get started and we'll have dedicated 10:43people to come help you go build. 10:45I think it's, it's a very different 10:46take than it's black and white between 10:48AWS and Microsoft and others, right? 10:50You see a very different target audience. 10:52You see, you know, a lot more geeky 10:55conversations, hands on tech, this 10:57stuff scales kind of conversation, and 10:59you can build epic things with this. 11:02Yeah, yeah, yeah, being able to build those 11:04epic things is really important, because, I 11:06mean, the way I see it, um, is that, you know, 11:08with this new chain of thought algorithms 11:10where these models can begin to self learn, 11:13is that, You know, it's almost like we, you 11:15know, we build these foundational models with 11:17pre training, um, and then you have a choice. 11:19Do you want to fine tune it? 11:21Um, do you want to do some 11:22instruct tuning, right? 11:23But then I've noticed that now there's this, 11:26instead of just, uh, really fast inference, you 11:28know, there's this thinking phase now, right? 11:31And this thinking phase, you 11:33know, it can go on for minutes. 11:35Even ours, right? 11:37And because this thinking phase is happening 11:39with all these emergent behaviors and skills, 11:42you know, you need this scalable, um, secure, 11:46um, robust architecture that I think, you 11:48know, was announced at, uh, at this conference. 11:51So it's, it's real exciting, right? 11:53To, to be a part of this, right? 11:54And to watch what's happening. 11:56That's great. 11:57Well, definitely to keep an eye on. 11:58There's going to be a lot 11:59more action in the space. 12:00And yeah, it is really exciting. 12:01I mean, I think that, uh, if had 12:02you asked me 24 months ago, I would 12:03have been like, Amazon's way behind. 12:05They're never going to catch up, 12:06but you can never really count 12:07them out because they're Amazon. 12:08So 12:10this is one last thing I would add to 12:12this being the world's cloud provider. 12:15They have the largest market share. 12:17They see a lot of workloads. 12:19So they have an unfair advantage 12:21that others do not have. 12:23They can see how people are 12:24actually leveraging these tools. 12:26What are they building? 12:26How are they contributing back to the community? 12:29So they can go test out and be the second 12:32movers and just dominate after that, right? 12:34Because people build stuff, there's 12:36a lot of small startups that have 12:38built all these niche things that got 12:40announced as features within AWS, right? 12:42So you have this unfair 12:43advantage that AWS has. 12:45Because they see what, how people 12:47are actually using it in enterprises. 12:49Bringing in a large trusted partner, like 12:52one of the most trusted brands is Apple. 12:55Making a statement that Apple for the 12:57last decade has been building on AWS. 12:59That gets all the financial services clients, 13:02one of them was sitting right next to me, 13:03got very excited saying that, oh, this is 13:05a really clear statement that you're doing. 13:08Trusted computing. 13:09If you have Apple on stage talking about 13:12the massive financial services like JPMorgan 13:14Chase of the world, they're doing some 13:15incredible AI workloads on this, right? 13:18So they've made a very, very bold statement, 13:21and they're going off of the mainframe 13:22business with this as well, right? 13:23They're saying, traditionally, there were 13:25a lot of transactional systems that were 13:27high compute needs, and you could not really 13:30synchronize them on the cloud, whatnot. 13:32If you look at a series of announcements, 13:33they're doing a really good game plan on 13:36how do we go attack workloads that haven't 13:37moved to private clouds or secure clouds yet. 13:40Yeah, I think that's right. 13:41I remember when it came out a few years ago 13:42that was it Netflix was running most of its 13:45infrastructure on AWS and being like the 13:47amount of video they're moving through that 13:49system is like just, yeah, crazy to imagine. 13:52So yeah, I think it's a really good point. 13:59We're going to move us on to 13:59our next topic of the day. 14:01Um, There was a great blog post that came 14:03out from a security team called Knostic, 14:05that's spelled with a K, um, on a new class 14:08of LLM attacks that they call flow breaking. 14:11Um, and what was kind of interesting is that 14:12they kind of are proposing this as kind of 14:14a new sort of third kind of attack we're 14:16seeing in this space, um, with the other two 14:19being prompt injection, um, and jailbreaking. 14:23Um, and specifically what Flowbreaking 14:24kind of focuses on is the fact that many 14:27of these AI applications are built as 14:30really kind of ensembles of models that are 14:32doing lots and lots of different things. 14:34And in many cases, there are separate models, 14:36separate filters that are implemented to block 14:39unsafe generations on the part of the model. 14:42So you know, the model might go to advise you 14:44that you do something dangerous, um, and there's 14:46another, there's another system that says, Oh. 14:48That's not actually what we should 14:49do, pauses the generation and then 14:51regenerates it in a more safe way. 14:53And what a lot of flow breaking is focusing 14:55on is how do we kind of use that as a way 14:57of getting unsafe material out of the model? 14:59Because there is this kind of gap 15:01between the model itself and the kind 15:03of safety measures that are put in. 15:05Um, and Vagner, I know you were the 15:07one who kind of flagged this for us. 15:08I, if you want to talk a little bit about like. 15:10How does this kind of change our 15:11thinking about security on models? 15:13And does it, does it make it things 15:15more complicated for us, right? 15:17As we kind of think about how do we 15:18secure these models against manipulation? 15:21I think that it is interesting because 15:23it, it tells us how people are, um, 15:26building architectures of models and 15:28how they are placing the guardrails. 15:31Uh, and if we look back on soft 15:34engineering, it's, It's another way 15:36of exploring race conditions, right? 15:38And also we can think about, uh, asynchronous 15:40requests and how all of these is happening. 15:44And with this new, uh, uh, attack, 15:47they're basically exploring this, uh, this 15:50interval, this millisecond interval between 15:52generating and the guardrail, uh, taking 15:54over and showing that if the content. 15:58it was sent, then this can be harmful. 16:01I think that that is the key point. 16:03And, uh, I tried to, to, uh, uh, replicate, 16:07uh, uh, and I was able to replicate one of 16:09the, the two attacks that the team, uh, showed. 16:12And, and, uh, it is, uh, um, one was not working 16:17anymore, at least on the, uh, ChatGPT 4o that 16:19I tried, but the other one was, uh, yeah. 16:23And, but it's interesting in the sense that. 16:27Um, the data is sent, right? 16:29So I think that is important for us to rethink 16:30the way that we structure and we place these 16:35guardrails in our architectures and also 16:38even, uh, like organizing the request if 16:40there are two asynchronous requests, then 16:42probably the data will be sent to the user. 16:44I think that that is the key aspect 16:46and the content may be harmful 16:48and someone may be, may, use that. 16:50So I think that that is the key aspect. 16:52I think that they, they showed an even 16:53that for the human, for the common user, 16:57uh, this will not show because it's 16:59so fast that it's hard to, to, to see 17:02the content, but the content is there. 17:04I think that is the key. 17:05Yeah. 17:05Yeah. And I think it's kind of, it's really 17:07fun and Vagner, like you're saying, I 17:08think because it reveals so much about 17:10how these systems are architected. 17:11I mean, Aaron, if I can kick a question 17:13over to you, it's like, why are these 17:14companies streaming the unsafe tokens? 17:17at all, right? 17:18Like, doesn't it make more sense to have 17:20an architecture where you do the safety 17:22checks before the tokens get to the user? 17:24Like, why is it that we have this kind 17:26of like millisecond gap where you can 17:28kind of get this unsafe stuff out, you 17:29know, from the point of view of the user? 17:32Yeah, that's a great question. 17:33Um, I mean, it appears as though, 17:35you know, we're, we're always looking 17:36to be faster and faster and faster. 17:38Right. And sometimes we can see to speed, 17:41um, of response over responsibility. 17:44Right. 17:44And, and because of that, you know, we're 17:46willing to take or, um, extra risk, right. 17:50Um, But you have to look 17:51at the opportunity cost. 17:52And I think with this study that's been, 17:55it's fairly well done, you know, with this 17:57flow breaking where this type of, I call 17:59it agentic social engineering, right, where 18:02you're basically trying to get agents to do 18:04something that they're not supposed to do, um, 18:06or, You're changing the order of operations. 18:09You're getting one agent to talk to another 18:11agent and skip over somebody else or 18:14something else where it shouldn't, right? 18:15And so there needs to be almost like 18:17this, um, auditing where you have this, 18:20these breadcrumbs of, you know, which 18:22agent has communicated with another agent. 18:24So that they can't skip another, right? 18:27Um, and then, um, the last point that I 18:30just wanted to make too was, you know, with 18:31broadcast TV, you know, whenever you're 18:33watching a live game, it's never, I, you 18:36know, always say it's never real time. 18:37There's always like a five second delay, right? 18:40Because there's time for somebody to take out 18:42vulgarities, or if someone runs onto a, Football 18:45field and does something a little odd, right? 18:47We can edit it out, you know, so perhaps 18:50we need to start thinking about, you know, 18:52these types where it's never exactly real 18:54time, but there's always this gap and delay 18:57so that, you know, we can ensure the safety 19:00of the audience before they see the content. 19:03But, but to recap that, I just think we 19:05need to be careful about conceding to speed. 19:07Over responsibility. 19:09I have a question for Aaron. 19:11There's a lot of techniques that we are now 19:12deploying with clients like prompt caching. 19:15AWS has released their automatic 19:17reasoning that has worked great for 19:18the last five years with ML models. 19:20Now they're bringing it to LLMs. 19:22Do you feel that having more, more of 19:24these checks and balances, like caching 19:26and things of that nature as well, do you 19:28think that we have a better opportunity 19:29today than we did six months back to go 19:31solve and catch for these bad behaviors? 19:33Yeah, I mean, that, I mean, great, great ideas. 19:36Um, I think we do, you know, because 19:38we certainly have more data to 19:40understand the problem, right? 19:42And then some additional tools at our 19:44disposal in our toolbox, you know, 19:46to attack, you know, um, these types 19:48of flow breaking, uh, pieces, right? 19:51And, um, through caching, you know, 19:53there's, there's a lot that you can do, you 19:55know, with, uh, caching because you can, 19:58um, sort of create these hashes to know. 20:01where the data has already been and 20:02then you can recycle that data, right? 20:05Such that it, it's faster and therefore, 20:08you know, you don't have that extra 20:10milliseconds like, like Vagner mentioned 20:12to inject, you know, some sort of attack. 20:15Right. So, you know, so it, it, 20:16it just accelerates, right. 20:17Um, the, the speed of which we can, uh, 20:19These LLMs and agentic systems could respond. 20:23So Aaron, I see when we're doing these 20:25deployments for clients at scale in 20:27production, I feel that RAGs as a 20:29community, we have spent so much energy in 20:32improving RAGs to be more enterprise ready. 20:35I feel that agents today are 20:37where RAG was 18 months back. 20:40They used to make amazing, nice demos on 20:43stage, great startups can go work with it, 20:45but when you get to enterprise, rags took 20:4718 months to come up with like 21 different 20:50methods of doing rags, whatnot, right? 20:52So I think that agents, I think there's a 20:54little bit more security risk at this point. 20:57For Rags, we've done a fairly decent job of 20:59access control and things of that nature, 21:01all kinds of hallucination detections. 21:03I'm really hoping that the community 21:04will push agents and better 21:06frameworks quite a bit in 2025. 21:08Yeah, I think that's kind of the interesting 21:09question I was going to ask you, Shobhit, 21:10is like, I think what Aaron's proposing 21:12is that you know, particularly for 21:13agents right now, there's a little bit 21:15of like a speed and safety trade off. 21:17And, you know, I guess kind of 21:18what you're saying for rag is like, 21:19there's reason for optimism, right? 21:20At some point we might be 21:21able to both be fast and safe. 21:23Um, do you think that's true? 21:24Like, do you think in agents right now 21:25there is kind of this trade off just 21:26because like, we don't still really 21:28know how to ensure safety in them? 21:30Yeah. So I think auditability and, um, like 21:32we're doing this for a very large, 21:33uh, client right now where we are. 21:36creating multiple, uh, agents that will talk 21:38to each other in production and whatnot, right? 21:40When we, when we attempt to even start getting 21:43legal approvals as we go release them state 21:46by state, there are so many questions that are 21:49unanswered in the agentic frameworks today. 21:51It is not deterministic, so we need to be 21:53very careful that two different people are 21:55not getting two completely different answers. 21:58With all the checks and balances for bias, 21:59output, things of that nature, we need better 22:01guardrails, better examples of how to go call 22:04this API every time and so on and so forth. 22:06So I think just like we did with large 22:07language models, the smaller, we'll move 22:10forward towards much, much smaller models 22:12over time and hence smaller agents. 22:14So you'll start to build a set of agents 22:17that have been certified that they do 22:19this particular job incredibly well. 22:21So just like we started to create a farm 22:23of RPA bots, and each one did one task 22:25really well, I believe that we'll get to a 22:28marketplace where we will have agents that 22:30have been pre trained, and some agents do an 22:33incredible amount of work really, really well. 22:35And I think we'll get to a point just like we 22:37do at Fiverr, or if you're going and, you know, 22:39getting some services online, and you will 22:41see that people will start rating these agents 22:43well, you'll have some of the leaderboards and 22:45stuff say, hey, if I want to go for a flight, 22:47if I want to find the cheapest flight from 22:49A to B, I want to use this agent, I'm going 22:52to pay 20 cents for it and get that done. 22:54So I think we'll get to a world inside of 22:56enterprises curated, secure, as well as external 22:59commercial, where these agents will start to 23:02To compete and do work really, really well, 23:04but they'll do one small task really well. 23:07The meta orchestration is where the 23:09enterprise will invest a lot there. 23:11I think the security will 23:12start to get addressed better. 23:14Yeah, that makes a lot of sense. 23:16And I want to go back to 23:17just a little bit moment ago. 23:18You use this very tantalizing phrase, 23:20which is agentic social engineering. 23:23That's really an intriguing idea. 23:26And I guess if you go into that a little 23:27bit more, I mean, is that literally what 23:28you're thinking about is like, well, we have 23:30social engineering and security, which is. 23:32You know, I call and I convince, 23:33you know, the boss to give me the 23:35password to get into his system. 23:37Um, you think that actually is kind of like 23:39how we should think about agentic security, 23:41which is now, we're not even talking about 23:43humans anymore, but kind of the manipulation 23:44of agents for, you know, not so good ends? 23:47Yeah, I mean, if I put a focus in on, you 23:50know, this particular flow breaking, you 23:51know, it seems like the authors came up with 23:53four different types of vulnerabilities. 23:55You know, so there's what, like, 23:56um, forbidden information streaming 23:59window, um, order of operations. 24:02If you could skip, you know, agents talking 24:04to others and the software exploitation. 24:06So if, if a component gets too busy, you 24:10know, then it becomes overwhelmed and 24:11affects other components of the system. 24:13And so. 24:14Those four different vulnerabilities, I think 24:18the way that I have a mental model, you know, 24:21could go over towards this agentic social 24:23engineering, where you get these agentic pieces, 24:26um, to do something that they maybe shouldn't 24:29do or to change the order of operations to 24:32exploit software, um, to inject data, right, 24:36uh, or prompt into a streaming window, right? 24:38So, um, Yeah, and, and, and I 24:41think, you know, the Turing test 1. 24:430 and 2. 0, you know, where we're trying to get, 24:46um, these LLMs, you know, to behave and act 24:49and rationalize like a humans, you know, 24:51it's almost like we can social engineer 24:53them because they almost have this, 24:56their own mindset, almost like a theory 24:57of mind where they can, they can, begin. 25:00It's not there. 25:01I mean, I mean, we have a ways to go, but where 25:03one LLM can maybe understand, Hey, this other 25:06LLM has its own mindset, its own beliefs, right? 25:10And, uh, you can try to train some 25:11of that through metaprompting, but, 25:14but that's, that's the way that. 25:16That I'm beginning to start to 25:18think about some of these problems. 25:20Yeah. And I also want to make 25:20sure we get Vagner in here. 25:21I mean, cause of Vagner, you kind of think a 25:22lot about, you know, model security and safety 25:25and like, how do we ensure these models are kind 25:26of responsibly deployed and it feels like this 25:29is like a really interesting interface, right? 25:31It was like, we have all these. 25:32methods that we use for thinking about how we 25:34manipulate humans, right, as a security problem. 25:37And maybe we can kind of import 25:38that to these kind of AI systems. 25:40Now, 25:41I bet that there are people thinking about this 25:43right now for the good and for the bad, right? 25:46Uh, and, and, uh, the, the term that 25:48Aaron used as the, um, also intrigued me 25:50and, and I, I think it is interesting. 25:53And I started thinking that the, 25:56the flow breaking attack is only. 25:58Possibly because the architecture 26:00that he created is for, is based 26:02on human perception, right? 26:03If you think about agents, the first 26:05response would be caught by an agent and 26:08that would be a problem already, right? 26:10So if they're agents consuming, uh, 26:14these endpoints, the way that they are 26:17architected right now, these agents 26:20can consume that information, right? 26:21So I think that that is, is the first 26:24thing that came to my mind, right? 26:26If there are architectures 26:27based on human perception. 26:29Agents don't have this limitation 26:30about this millisecond that the 26:32information appears and is deleted. 26:33So the agent will, uh, agents will consume 26:35that information and what else, right? 26:38Yeah, I mean, I mean, it's almost 26:39like you need a social contract. 26:40You know, who can, or which 26:42LLMs can talk to which LLMs. 26:43Almost like a communication graph, you 26:46know, so you can trace, uh, so almost 26:48creating social clicks, right, in a sense. 26:51But it's, it's just really interesting. 26:53Your agent starts hanging 26:53out with, like, the bad kids. 26:55Goes wrong, you know. 26:57It's funny. 26:58And that boils down to the kind of 27:00architecture we end up using with 27:01multi agent frameworks, right? 27:02They're, depending on the problem we're 27:03trying to solve for our enterprise clients, 27:05you, in certain clients, we will go 27:07create a series of small agents, one or 27:09the other is more sequential in nature. 27:11In certain clients, there's a different 27:12framework for having a meta agent at the 27:14top, and everybody else is kind of serving 27:16the tasks assigned to them, and everybody 27:19essentially sends the responses back. 27:21Then there are certain clients we're 27:22working with where we create a network of 27:24agents so they can all talk to each other. 27:26And in certain cases, there's a tie, we 27:28could do voting to go do a tiebreaker. 27:30So this is just depending on the 27:31different kind of architectures. 27:32And the social engineering part will get more 27:34and more interesting in this space, right? 27:37You may have in our, in our organizations, 27:40we have a legal team, we have an AI 27:42ethics committee and so forth, right? 27:43That we will escalate to like, hey, you 27:44guys tell us how to do this well, right? 27:46So I think we'll start to replicate 27:48how human organizations work. 27:50inside of these agentic multi 27:52agent frameworks, right? 27:53So I think there will be a good 27:55cop, bad cop kind of situation. 27:56There will be somebody who interprets 27:58the word of law and say, Hey, my 28:00interpretation of this legal contract is X 28:03and everybody has to abide by that, right? 28:05Yeah, it'll be fascinating because it's, uh, 28:06I mean, famously like Conway's law, right? 28:09It's like you, you ship your organization chart 28:11and it's kind of like, this will play out here. 28:13They'll just be like the lawyer agent in the 28:14app, you know, that's like reviewing everything. 28:21So I'm going to move us on to our next topic. 28:23There was a really interesting story 28:25that popped up earlier in the week. 28:26Some users on social media noticed that there 28:29are certain names, David Mayer being one of them 28:32and Jonathan Zittrain being another, and people 28:34identified a few others, that are names that are 28:36systematically refused by OpenAI to talk about. 28:40So you'll say, hey, do you know 28:41anything about David Mayer? 28:42And OpenAI would just like, not engage at all. 28:44This is kind of mysterious. 28:46People did some investigations. 28:47Uh, as far as we can tell, per Ars Technica, 28:51this might be the result of an additional 28:53filter that OpenAI implements to deal 28:55with things like, um, defamation claims. 28:58So this would be a case where someone 28:59comes to OpenAI and says, Hey, OpenAI 29:01is saying all sorts of lies about me. 29:03I don't want OpenAI to talk 29:04to talk about me at all. 29:06You know, take my name out of your system. 29:08Um, and I think this is really 29:09fascinating because it kind of reveals. 29:11like how these systems are being 29:13administered on the back end. 29:15And I think raises some really 29:16interesting questions, right? 29:17Because I think that if in the future, stuff 29:19like ChatGPT is like the source of truth, right? 29:22You're like, Oh, I'm going to 29:23meet Vagner for the first time. 29:24What do you know about Vagner Santana? 29:26Right? Like your ability to kind of like 29:28pull information out of this system. 29:31you know, could be used for ill and 29:32could be used for, for, for good as well. 29:34I can imagine situations where 29:35you do want that privacy. 29:37And I guess maybe Vagner, I'll, 29:39I already name checked you. 29:39So maybe we'll just throw the question to 29:41you is like, how do you think companies 29:43should navigate the ethics of this? 29:44It's like a really hard problem, right? 29:46It's like someone comes to you saying, 29:47I don't want ChatGPT to talk to me, talk 29:49about me rather, um, is ChatGPT supposed 29:53to just say, okay, fine, we're going 29:54to just take you out of the system. 29:55Or is there a kind of an obligation for these 29:57models to be able to talk about everybody? 29:59I'm curious about what you think. 30:00Yeah, it's interesting that now we're 30:03experiencing, uh, how, uh, legislation impact 30:08is impacting this kind of system, right? 30:09Because this has the, the, the flavor, 30:12the smell of, uh, something, or, uh, 30:15probably someone with moving a case. 30:18saying exactly that, okay, I don't want this 30:21technology saying this, this and that about me. 30:24Um, but at the same time, there are people that 30:28have the same name that cannot be recognized 30:31by this technology and they may want that. 30:35And, and it's interesting that when I got the 30:37list of names, I Of course, try to replicate. 30:40And I even tried to combine the 30:41flow breaking with the list. 30:43I'm just typing a trend here where 30:44you read something and then 30:45you just try to go replicate. 30:47Yeah. Sorry about it. 30:48Go ahead. 30:49No, no worries. 30:50And then, um, it is, it has this, this smell 30:54like, okay, this, uh, has to do with some, um, 30:58someone or some, Uh, uh, organization moved, uh, 31:03uh, uh, uh, uh, an action against the company. 31:05And then, uh, you cannot 31:07talk about that anymore. 31:08And, um, and also if you think about, uh, rights 31:11that people have with certain laws, we have 31:14the right to be forgotten, forgotten, right? 31:16If we, we may request a company that, 31:21Um, has our data to not, or to delete, 31:24or to not provide that data, right? 31:27And um, it depends on country or region, uh, 31:32but yeah, for, depending on the legislation 31:35where you are, you, you may have this right. 31:37So again, now we go back to the discussion 31:39we had before, like in terms of architecture 31:41and probably the way that these models were 31:44trained, they were not prepared for that, right? 31:46And what we've seen is the 31:48result of hard coded rules. 31:49Right? 31:50And, uh, uh, that, uh, are excluding, uh, 31:54not the one person that issued, but everybody 31:58with the same name or with a similar name. 31:59Right? 32:00That's right. 32:00Yeah. Yeah. 32:00There's so much that happens, I think, 32:02because of this very specific situation 32:04where you spend so much money and time 32:06and resources to pre train this model. 32:09And then you're like, Oh man, 32:10we have to fix all these things. 32:11And it's like very hard for us to run that. 32:13You know, training process again, so we're 32:15kind of forced to build all these things 32:17that we kind of bolt on to sort of like patch 32:18up the the holes and in what we discover, 32:22I guess show, but I do have a view on this. 32:23Like, do you think like, I mean, should I 32:25know in the past, you've been very pro, you 32:28know, AI is good for all sorts of things. 32:31And like, I can imagine someone like you 32:32saying like, no, people shouldn't have the 32:34right to just like, write open AI a letter 32:36and have their name taken out of the system. 32:38I don't know, maybe I'm just 32:39putting you in a box, but. 32:41Okay, like I think if you have a consolidation 32:43of one or two mega techs that are controlling 32:46information flow, if you have a few people 32:49who are, who have the authority to go 32:51censor stuff, that is very dangerous. 32:54Right? You don't want to be in a society where 32:56a few small people who fire their, their 32:59AI ethics board and they are controlling 33:02what is allowed and what not, right? 33:03But I do think that over time you will have a 33:06split and split in the way the responses come 33:10to you, depending, and if you're trying to see 33:12what we're doing in the media space, right? 33:14By definition, you're self reflecting 33:17bias towards how you consume 33:19information and what your beliefs are. 33:21You will either on spectrum of. 33:23CNN and Fox, you'll have one 33:25of the other spectrums, right? 33:27So over time, you've figured out that most 33:29of the media, the way I want to consume, 33:31these are the certain agencies or media 33:34outlets that reflect my values, or talk 33:36about the stuff that I'm interested in. 33:38And you get to there by looking at the click 33:40stream, what I've spent more time, what I've 33:42forwarded more, and so on and so forth, right? 33:44So I do believe you'll come to a point where 33:45you would have ChatGPTs of the world, having 33:49more personalized flavors of things that they 33:51remember that you've asked in the past, right? 33:53So I think over time you'll get to a 33:55point where there's some central authority 33:58that is making some broad recommendations 34:00around what happens to policies, but then 34:02there's this personalized set of policies. 34:04I never want to, I can go and block stuff on 34:07on Twitter and say, Don't show me this again. 34:09This is irrelevant and stuff. 34:10It'll start becoming personalized to me. 34:12So I think we'll get to a point where it's a 34:14good combination of Shobit responses that I get 34:16from ChatGPT may be very different from what 34:19Aaron is getting when he asks the same question 34:21because ChatGPT now knows so much about our 34:23preferences and is tailoring the answers to us. 34:25And do you think so? 34:27Yeah. Yeah, I, I do. 34:28I, I, I think that those are really good points. 34:30Um, I, I wanted to just, uh, mention too 34:32that, uh, part of my day job, you know, is 34:34running live events, you know, so within 34:36the sports entertainment, you know, the U.S. Open, for example, ESPN, 34:40fantasy football, and so on. 34:42During these live events, you know, 34:44as Shobhit and Vagner were pointing 34:46out, you know, sometimes we need to 34:48fix a problem right then and there. 34:50Right. 34:51We don't have time to diagnose what 34:53it is and to give a prognosis of it. 34:55And so that's where we use, you know, 34:56similar techniques of a block list. 34:59Right. So, um, a funny story and, you know, 35:0130 seconds is there's a, um, tennis 35:04player that has a last name of sock. 35:07Um, and, um, we found that it 35:09was generating con content about. 35:12Socks and pants and shirts, 35:14nothing to do about tennis. 35:15Right? And so we're like, wait a 35:16second, how is that happening? 35:18And so we had to quickly put in a, I won't 35:21call it a block, but a filter, right? 35:23To, to filter out data that had 35:25to do about clothing, right? 35:26Because it was just very ambiguous and, 35:29and so just these quick stop gaps, right? 35:31Until we have time. 35:33to fix an issue, right? 35:34That could be detrimental to maybe 35:36society, you know, I think it's important. 35:39But on the other hand, when we do that, when we 35:41also have to be transparent about what we are 35:44blocking, what we are changing, and there's many 35:46places in these large genetic systems to have 35:50these kind of, I call them filters or blocks, 35:53right, to do different types of functions. 35:55Um, uh, but, uh, but yeah, and then, 35:58and then one last point that Vagner 36:00made was being able to delete data. 36:02Um, so the field of, uh, machine 36:04unlearning, I think, is very important. 36:07Um, it's, it's a very deep field. 36:09There's work going on, um, you know, right now. 36:12I mean, it's, it's, it's moving quick, but 36:14there's different techniques, right, where you 36:15could train, model across, um, stratified data. 36:19So if you need to remove, you know, Some type 36:21of data, you just simply remove that model 36:23because you know that data is embedded in there. 36:25But if you have one massive big model, 36:28it's very difficult to remove it, right? 36:30So, so there's different 36:31ways of, of handling this. 36:33And I just think in terms of, um, you 36:37know, as these new techniques come 36:38online, it just becomes easier, easier, 36:41you know, but, but there's always side 36:42effects with them, um, to, to do it. 36:45But just my word of caution is let's just 36:47be careful not that we're not censoring 36:50You know, data, um, in unintended 36:52ways, but we're being transparent about 36:55how we're creating these stopgaps. 36:57Yeah, for sure. 36:58I mean, I think that's one thing 36:58that's kind of unique about this 37:00story is that no one knows why, right? 37:02Like, you know, people just find out 37:04that, like, you can't talk about people. 37:06And, you know, I think one, one 37:08improvement going forward, Zyron, 37:09to your point, is Yeah, we should at 37:11least have some kind of message, right? 37:12That says, like, why is 37:13it that we can't see this? 37:15Um, otherwise I think we're left to do what, 37:16you know, Vagner's doing, which is we try 37:18to replicate and then we try to speculate. 37:19And, you know, I think that's 37:20kind of, that's maybe not the best 37:22situation for, for the ecosystem. 37:24Well, as always, uh, more to talk about 37:26than we have time to cover in MOE. 37:29Um, and so Aaron, Vagner, 37:31Shobhit, thanks for joining us. 37:32And thanks for joining us, all you listeners. 37:35If you enjoyed what you heard, you can get 37:36us on Apple Podcasts, platforms everywhere. 37:39And we will see you next 37:41week on Mixture of Experts.