Nvidia's Future Amid AI Chip Rivalry
Key Points
- Experts predict NVIDIA will remain among the top five AI hardware leaders in five years, though the market will become more fragmented with new chip architectures and emerging neuromorphic designs.
- AWS’s reInvent conference was highlighted as the year’s premier AI event, showcasing Amazon’s aggressive push into AI infrastructure, including the upcoming launch of its Trainium 3 AI accelerator.
- Amazon is positioning itself to dominate the AI stack, building supercomputers for partners like Anthropic that are reportedly five times more powerful than existing deployments.
- The “Mixture of Experts” podcast frames these developments within broader industry trends, emphasizing rapid innovation, competitive chip advancements, and the evolving landscape of AI hardware.
Sections
- NVIDIA's Future in AI Hardware - Experts debate whether NVIDIA will remain a top AI hardware player over the next five years amid a fragmented, evolving chip landscape.
- AWS's Expansive AI Ecosystem - The speaker highlights AWS’s comprehensive AI portfolio, collaborations, and proprietary developments, while cautiously questioning whether the hype will translate into market dominance.
- Edge AI Chip Market Outlook - The speaker argues that the dominance of edge inference chips in agriculture will depend on business cases driven by connectivity hotspots and data availability, echoing Ben Thompson’s claim that future value will lie in infrastructure rather than constantly updated AI models.
- AWS Emphasizes In‑House AI - The speaker describes how AWS’s recent re:Invent highlighted its AI platform, urging customers to build scalable, secure solutions themselves rather than rely on external APIs, and offering hands‑on support through demos and GitHub resources.
- AWS Leverages Insight & Apple Partnership - The speaker explains how AWS’s deep visibility into customer workloads and its high‑profile alliance with Apple provide an unfair advantage to anticipate, validate, and dominate emerging AI and financial‑service workloads.
- Exploiting Guardrail Timing Vulnerabilities - The speaker explains how attackers leverage the brief millisecond gap between a model's output generation and its safety guardrails—especially through asynchronous request race conditions—to leak harmful content, urging a redesign of architecture and guardrail placement.
- Balancing Real‑Time Delivery and Safety - The speaker compares broadcast TV’s built‑in delay for content moderation to emerging safeguards such as prompt caching and AWS automatic reasoning, arguing that newer tools give a better chance to detect and prevent harmful LLM behavior.
- Auditable AI Agent Marketplace - The speaker outlines challenges of ensuring deterministic, bias‑controlled AI agents for legal compliance and proposes a marketplace of certified, task‑specific agents, similar to RPA bots and freelance services, with ratings and guardrails.
- Meta-Prompting, Model Security, and Theory of Mind - The speaker discusses applying social‑engineering and security concepts—such as metaprompting, flow‑breaking attacks, and model safety—to develop LLMs with a rudimentary theory of mind and more human‑like rationalization.
- AI Multi-Agent Governance & Name Censorship - The speaker predicts that future AI multi‑agent frameworks will mimic human organizational structures—including legal‑interpretation roles that create “good cop/bad cop” dynamics—and then highlights a recent case where OpenAI seemingly refuses to discuss certain individuals, likely due to a defamation‑filtering mechanism.
- Challenges of Right‑to‑Be‑Forgotten in AI - The speakers discuss how legal deletion rights force AI developers to rely on hard‑coded, over‑broad filtering patches because pretrained models weren’t designed to accommodate data removal, making compliance costly and imperfect.
- Personalized AI Policies and Live Event Controls - The speakers discuss future centralized yet personalized AI policy recommendations, using blocklists for real‑time event troubleshooting, illustrated by a quirky example of a tennis player named Sock generating unrelated content.
- Clarifying the Unexplained Restriction - The hosts debate the opaque rule that “you can’t talk about people,” propose adding an explanatory message to aid the ecosystem, and wrap up the episode with thanks and a plug for the podcast.
Full Transcript
# Nvidia's Future Amid AI Chip Rivalry **Source:** [https://www.youtube.com/watch?v=ZEwJfi7xPxc](https://www.youtube.com/watch?v=ZEwJfi7xPxc) **Duration:** 00:37:49 ## Summary - Experts predict NVIDIA will remain among the top five AI hardware leaders in five years, though the market will become more fragmented with new chip architectures and emerging neuromorphic designs. - AWS’s reInvent conference was highlighted as the year’s premier AI event, showcasing Amazon’s aggressive push into AI infrastructure, including the upcoming launch of its Trainium 3 AI accelerator. - Amazon is positioning itself to dominate the AI stack, building supercomputers for partners like Anthropic that are reportedly five times more powerful than existing deployments. - The “Mixture of Experts” podcast frames these developments within broader industry trends, emphasizing rapid innovation, competitive chip advancements, and the evolving landscape of AI hardware. ## Sections - [00:00:00](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=0s) **NVIDIA's Future in AI Hardware** - Experts debate whether NVIDIA will remain a top AI hardware player over the next five years amid a fragmented, evolving chip landscape. - [00:03:03](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=183s) **AWS's Expansive AI Ecosystem** - The speaker highlights AWS’s comprehensive AI portfolio, collaborations, and proprietary developments, while cautiously questioning whether the hype will translate into market dominance. - [00:06:06](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=366s) **Edge AI Chip Market Outlook** - The speaker argues that the dominance of edge inference chips in agriculture will depend on business cases driven by connectivity hotspots and data availability, echoing Ben Thompson’s claim that future value will lie in infrastructure rather than constantly updated AI models. - [00:09:12](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=552s) **AWS Emphasizes In‑House AI** - The speaker describes how AWS’s recent re:Invent highlighted its AI platform, urging customers to build scalable, secure solutions themselves rather than rely on external APIs, and offering hands‑on support through demos and GitHub resources. - [00:12:17](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=737s) **AWS Leverages Insight & Apple Partnership** - The speaker explains how AWS’s deep visibility into customer workloads and its high‑profile alliance with Apple provide an unfair advantage to anticipate, validate, and dominate emerging AI and financial‑service workloads. - [00:15:21](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=921s) **Exploiting Guardrail Timing Vulnerabilities** - The speaker explains how attackers leverage the brief millisecond gap between a model's output generation and its safety guardrails—especially through asynchronous request race conditions—to leak harmful content, urging a redesign of architecture and guardrail placement. - [00:18:24](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=1104s) **Balancing Real‑Time Delivery and Safety** - The speaker compares broadcast TV’s built‑in delay for content moderation to emerging safeguards such as prompt caching and AWS automatic reasoning, arguing that newer tools give a better chance to detect and prevent harmful LLM behavior. - [00:21:30](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=1290s) **Auditable AI Agent Marketplace** - The speaker outlines challenges of ensuring deterministic, bias‑controlled AI agents for legal compliance and proposes a marketplace of certified, task‑specific agents, similar to RPA bots and freelance services, with ratings and guardrails. - [00:24:38](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=1478s) **Meta-Prompting, Model Security, and Theory of Mind** - The speaker discusses applying social‑engineering and security concepts—such as metaprompting, flow‑breaking attacks, and model safety—to develop LLMs with a rudimentary theory of mind and more human‑like rationalization. - [00:27:43](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=1663s) **AI Multi-Agent Governance & Name Censorship** - The speaker predicts that future AI multi‑agent frameworks will mimic human organizational structures—including legal‑interpretation roles that create “good cop/bad cop” dynamics—and then highlights a recent case where OpenAI seemingly refuses to discuss certain individuals, likely due to a defamation‑filtering mechanism. - [00:30:46](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=1846s) **Challenges of Right‑to‑Be‑Forgotten in AI** - The speakers discuss how legal deletion rights force AI developers to rely on hard‑coded, over‑broad filtering patches because pretrained models weren’t designed to accommodate data removal, making compliance costly and imperfect. - [00:33:53](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=2033s) **Personalized AI Policies and Live Event Controls** - The speakers discuss future centralized yet personalized AI policy recommendations, using blocklists for real‑time event troubleshooting, illustrated by a quirky example of a tennis player named Sock generating unrelated content. - [00:36:57](https://www.youtube.com/watch?v=ZEwJfi7xPxc&t=2217s) **Clarifying the Unexplained Restriction** - The hosts debate the opaque rule that “you can’t talk about people,” propose adding an explanatory message to aid the ecosystem, and wrap up the episode with thanks and a plug for the podcast. ## Full Transcript
Five years from now, is NVIDIA still
the biggest name in AI hardware?
Aaron Baughman is an IBM
Fellow and master inventor.
Welcome back to the show, Aaron.
What do you think?
So I do think that they're
going to be in the top five.
Um, the field's going to be much more
fragmented with different chip architectures,
but I'm looking forward to see what types
of neuromorphic chips are going to come out.
Vagner Santana is a staff research scientist,
master inventor on the responsible tech team.
Vagner, welcome back.
Your predictions, please.
Uh, I, I second, uh, Aaron.
I think that I.
Uh, NVIDIA will be still the top, but with
different architectures and maybe cooler
ideas on new architectures for chips.
Yeah, I hope so.
Uh, Shobhit Varshney, Senior Partner Consulting
on AI for US, Canada, and Latin America.
Shobhit, tell us what you think.
I think NVIDIA, in terms of AI
systems, beyond just the chip,
there's a lot that goes around it.
I think it'll be a force to reckon
with for the next five years.
And I would say this should
be the top two or three.
All that and more on today's Mixture of Experts.
I'm Tim Hwang and welcome to Mixture of Experts.
Each week, MoE brings you the analysis,
hot takes, and banter that you need
to keep up with the ever hectic
world of artificial intelligence.
We've got another packed
schedule for today's episode.
We're going to talk about a new jail brick
that's hitting the scene, people you can't
talk about on ChatGPT, but first we wanted
to take as our top story, the AWS reinvent
conference, which is happening this week.
So for those of you who may be less familiar,
this is the annual conference for Amazon's AWS.
And there's been a host of big announcements.
Uh, coming out of Amazon this week, um, not
least of which is that they are announcing
that their new generation of their sort of AI
chip, what they call Trainium, uh, Trainium
three is going to be launching very, very soon.
Um, and, uh, there's a lot to get
into, but I think Shobhit, I wanted to
kind of throw it to you first, cause
you were actually at the conference.
Um, I just talked a little bit about Trainium,
but curious if there's like, You know,
what are the trends that you're seeing?
What are other big announcements
our listeners should know about?
From my vantage point, AWS reInvent
was the AI event of the year.
That's a pretty bold statement.
I mean, there's been a lot
of big AI events this year.
In terms of what they're trying to do to
change the industry and absolutely dominate
in the AI space is just absolutely incredible.
If you look at all the different stacks
or the layers of the stack at the compute
level, they are doing a lot in terms of
the chips and so are other competitors
as well, but they are quite ahead.
When you have somebody like Anthropic
and you are building a supercomputer
for them, that's five x more powerful
than what Anthropic has today.
That is making a bold statement by, as
a company, uh, Amazon has to do a lot
of ai and they may have a really good
history of doing that for 10, 15 years.
So they're building on, on top of that.
Computers gonna be very critical for them.
Nice. ROI second on top.
All the storage, the amount of options
you get as a developer is just.
Incredible.
It's like it's a dream for us
to go build for our clients.
It's one of our largest partners globally, AWS.
So we do a lot of work in building large systems
with all the different options with them.
Then there's the AI layer for all the models.
I think across the board, they've
been very clear on choices.
If there's one word that summarizes
AWS today, it's ecosystem.
They're trying to do their best to make sure
you have the best in class models available,
the best class apps and things of that nature.
But then, oh by the way, we also have our
own version that is delivering higher ROI.
We are matching our exceeding.
We have a massive announcement with NVIDIA.
And oh by the way, we have our own chips.
We have Great collaboration and investment
in Entropic and all these other models.
By the way, we also have our own.
So I think the choice is why people
will come to AWS and reinvent the
kind of announcements they have made.
We spent the last three days hands
on working with the product leads.
Being such a big partner of AWS, we get
some dedicated talent from AWS to give
us previews and give hands on experiences
on how this actually is working.
They've done an incredibly good job.
Like, I'm so, so excited about
the next few months going
and doing this with our clients.
Yeah, Aaron, so do you buy the hype?
I mean, should everybody else
in the AI space be scared?
Traditionally, right, like most of the
attention has been OpenAI and Anthropic
and the people who are doing the models.
Um, I guess Shobhit is kind of making the claim
here that, you know, it's like maybe Amazon's
going to kind of take the cake in the end,
but I don't know if you buy that argument.
Yeah, I mean, the way I look at
it is that it's sort of a quasi
contest to beat out NVIDIA, right?
Where, um, You know, they're, they're trying
to build, you know, their own ships to compete.
However, if, if you looked at the announcement
that AWS is still hedging, right, um, by a
partnering with NVIDIA, right, on P6, right?
So, so even though they're building their
own Trainium chips, they're still going to be
working with them on P6, you know, um, And,
and because of that, you know, they're looking
to see which way the tide's going to go.
And then I also, I view this as, you know,
they're looking, AWS is looking to reduce their
dependence on third party chips to enhance
their performance on AI workloads on AWS.
But to me, right, um, there's still a lot of
work that AWS has to do on the software stack.
Um, and they still have
to prove out performance.
You know, if we think back,
um, NVIDIA uses CUDA, right?
That's the most widely adopted
platform of AI workloads in the world.
And it's supported by PyTorch and TensorFlow.
Now, Trainium, it uses AWS's Neuron SDK,
right, which has a fraction of the market
share, and it's not as proven as CUDA.
So, yes, I think that the chip hardware itself
with Trainium is great, but AWS has work to
do to build the consumer and developer trust,
right, to be really, really competitive,
and that's why I think AWS is hedging
by still partnering with NVIDIA with P6.
Right, yeah, it feels like kind of like we're
kind of in this really interesting world
where, I mean, all the big cloud providers.
are kind of working on
their own chips right now.
And they're also all working with NVIDIA.
Uh, and I think it's kind of
everybody's hedging a little bit.
I guess, Vagner, maybe this
goes to your prediction.
You were saying kind of in the future, maybe
we'll just have a more diversity of chips.
And actually, that will
be the really good thing.
Do you have a prediction on kind of like
how the market's going to divide, right?
Like, will it be just like NVIDIA
for pre training or, you know,
these types of chips for inference?
Or I'm just kind of curious about how you
think that market's going to divide out.
I think that it will be
based on a business case.
Um, thinking back when I was, um, involved
with digital agriculture, when you see, uh, uh,
places that you have no connectivity and then
you start thinking, okay, if we have to have
like chips with inference, uh, running, uh, at
the edge, then That will be the chip that will
dominate the market if you have, let's say, uh,
machinery for agriculture using those chips.
And if you have access to data, if you
are, let's say, in the, um, next to the
place where they gather information, then
probably you have something for do the
pre training and training and that will
be, uh, and then you have enough power
connectivity in certain places in a huge farm.
So I think that it will be based on,
on business case, uh, uh, and how.
Connectivity and data arrives, uh,
uh, at these specific hotspots.
One of the really interesting comments,
uh, was from Ben Thompson, who writes
a newsletter called Stratechery.
It's like very, very good.
Um, you know, one of the ways he sort
of framed up a lot of the announcements
was Amazon's basically making the bet.
That, um, models won't matter
so much in the future, right?
That essentially it'll be sort of like
infrastructure that runs the day and
models will be like widely commodified.
And so, kind of like what we thought
was so special, which is, Oh my God,
you have to get the latest model that's
been released by OpenAI, is just going
to be less of a thing in the future.
Do you guys buy that?
Like, do you feel like, you know, what
we're seeing now is a movement of momentum
towards the infrastructure providers
versus kind of like the model creators?
Tim, I think we're making really good progress
as a community across each one of those.
Thanks.
Uh, we do need better intelligent models
for reasoning and things of that nature.
We're making some incredible
strides in that space.
That'll continue.
Uh, there's a lot that happens
before and after, uh, an LLM call.
AWS has done an incredible job
with their SageMaker stack.
All the kind of, uh, automatic reasoning checks,
the kind of things around, how do I go to, go
pull structured data as part of my LLM calls.
Enhancements to do all kinds of things that
we as developers need when you go and build
these for the clients in the last two years,
you've done a lot of custom work in the
middleware to make these elements work well.
And now you're seeing each one of
those providers catch up with giving
you a full ecosystem end to end
because they're also learning from
how enterprises are deploying these.
So I think AWS as a, as a, as a community
is further ahead than some of their
peers and giving you the full spectrum
end to end and making it super easy
for startups to come and go do this.
I always have an enterprise
mentality around these things.
They are doing an incredible job on grounding
on making sure there's right governance.
Massive ecosystem, you can bring your own
favorite eval to the framework and whatnot.
They're very, very well placed.
Models are going to get better.
There's going to be a constant battle on that.
But over
time, it becomes a commodity.
That's for sure.
So, I guess, Shobhit, what's your prediction,
I guess, for Amazon in the new year?
Like, this is, you said, this is
the biggest wave of announcements.
Like, where does it look like when
we're at reInvent, you know, 2025?
They have clearly made a very massive dent in
the AI community in the last two or three days.
Right. They, if you roll back two years,
2022, when, when we were here, they had
just made a bunch of AI announcements.
And then two days later we
had Chakchipiti come out.
So they've got caught off
guard at the re invent.
Last year re invent was more around, yes,
sure, I'm going to bring the big dogs on stage.
I'll have Anthropic on during
the keynote NVIDIA and whatnot.
Right. So they said, yeah, we
also have a lot of options.
This one, they just came in dominating this.
It took 24 months, but now they're killing it.
I was like, guys, we got this.
So I think the overall, the overall
messaging was, where else will you go to
do secure, scalable, end to end, infinitely
scalable, pluggable ecosystem, plain AI?
Don't outsource it to an API call.
Come build with us, and here's how simple it is.
Just a very subtle cultural thing.
There's every key, every session, technical
session I've gone to, They end with the same
kind of, uh, of, of enthusiasm, it says it
always ends well, what will you build next?
Right. They have a huge emphasis in every session.
They excite you about the possibility.
They give you a couple examples
of what clients are already doing.
And they say, what are you going to build next?
And the product managers
are going to hang around.
They'll show you how this works.
Whenever I have a conversation at AWS
with any of their folks, it typically
starts with, hey, let me point you to
a GitHub repo that does this for you.
And then I'll show you this in action.
But the first intention is go build epic
stuff with this man, go get up and we'll
get started and we'll have dedicated
people to come help you go build.
I think it's, it's a very different
take than it's black and white between
AWS and Microsoft and others, right?
You see a very different target audience.
You see, you know, a lot more geeky
conversations, hands on tech, this
stuff scales kind of conversation, and
you can build epic things with this.
Yeah, yeah, yeah, being able to build those
epic things is really important, because, I
mean, the way I see it, um, is that, you know,
with this new chain of thought algorithms
where these models can begin to self learn,
is that, You know, it's almost like we, you
know, we build these foundational models with
pre training, um, and then you have a choice.
Do you want to fine tune it?
Um, do you want to do some
instruct tuning, right?
But then I've noticed that now there's this,
instead of just, uh, really fast inference, you
know, there's this thinking phase now, right?
And this thinking phase, you
know, it can go on for minutes.
Even ours, right?
And because this thinking phase is happening
with all these emergent behaviors and skills,
you know, you need this scalable, um, secure,
um, robust architecture that I think, you
know, was announced at, uh, at this conference.
So it's, it's real exciting, right?
To, to be a part of this, right?
And to watch what's happening.
That's great.
Well, definitely to keep an eye on.
There's going to be a lot
more action in the space.
And yeah, it is really exciting.
I mean, I think that, uh, if had
you asked me 24 months ago, I would
have been like, Amazon's way behind.
They're never going to catch up,
but you can never really count
them out because they're Amazon.
So
this is one last thing I would add to
this being the world's cloud provider.
They have the largest market share.
They see a lot of workloads.
So they have an unfair advantage
that others do not have.
They can see how people are
actually leveraging these tools.
What are they building?
How are they contributing back to the community?
So they can go test out and be the second
movers and just dominate after that, right?
Because people build stuff, there's
a lot of small startups that have
built all these niche things that got
announced as features within AWS, right?
So you have this unfair
advantage that AWS has.
Because they see what, how people
are actually using it in enterprises.
Bringing in a large trusted partner, like
one of the most trusted brands is Apple.
Making a statement that Apple for the
last decade has been building on AWS.
That gets all the financial services clients,
one of them was sitting right next to me,
got very excited saying that, oh, this is
a really clear statement that you're doing.
Trusted computing.
If you have Apple on stage talking about
the massive financial services like JPMorgan
Chase of the world, they're doing some
incredible AI workloads on this, right?
So they've made a very, very bold statement,
and they're going off of the mainframe
business with this as well, right?
They're saying, traditionally, there were
a lot of transactional systems that were
high compute needs, and you could not really
synchronize them on the cloud, whatnot.
If you look at a series of announcements,
they're doing a really good game plan on
how do we go attack workloads that haven't
moved to private clouds or secure clouds yet.
Yeah, I think that's right.
I remember when it came out a few years ago
that was it Netflix was running most of its
infrastructure on AWS and being like the
amount of video they're moving through that
system is like just, yeah, crazy to imagine.
So yeah, I think it's a really good point.
We're going to move us on to
our next topic of the day.
Um, There was a great blog post that came
out from a security team called Knostic,
that's spelled with a K, um, on a new class
of LLM attacks that they call flow breaking.
Um, and what was kind of interesting is that
they kind of are proposing this as kind of
a new sort of third kind of attack we're
seeing in this space, um, with the other two
being prompt injection, um, and jailbreaking.
Um, and specifically what Flowbreaking
kind of focuses on is the fact that many
of these AI applications are built as
really kind of ensembles of models that are
doing lots and lots of different things.
And in many cases, there are separate models,
separate filters that are implemented to block
unsafe generations on the part of the model.
So you know, the model might go to advise you
that you do something dangerous, um, and there's
another, there's another system that says, Oh.
That's not actually what we should
do, pauses the generation and then
regenerates it in a more safe way.
And what a lot of flow breaking is focusing
on is how do we kind of use that as a way
of getting unsafe material out of the model?
Because there is this kind of gap
between the model itself and the kind
of safety measures that are put in.
Um, and Vagner, I know you were the
one who kind of flagged this for us.
I, if you want to talk a little bit about like.
How does this kind of change our
thinking about security on models?
And does it, does it make it things
more complicated for us, right?
As we kind of think about how do we
secure these models against manipulation?
I think that it is interesting because
it, it tells us how people are, um,
building architectures of models and
how they are placing the guardrails.
Uh, and if we look back on soft
engineering, it's, It's another way
of exploring race conditions, right?
And also we can think about, uh, asynchronous
requests and how all of these is happening.
And with this new, uh, uh, attack,
they're basically exploring this, uh, this
interval, this millisecond interval between
generating and the guardrail, uh, taking
over and showing that if the content.
it was sent, then this can be harmful.
I think that that is the key point.
And, uh, I tried to, to, uh, uh, replicate,
uh, uh, and I was able to replicate one of
the, the two attacks that the team, uh, showed.
And, and, uh, it is, uh, um, one was not working
anymore, at least on the, uh, ChatGPT 4o that
I tried, but the other one was, uh, yeah.
And, but it's interesting in the sense that.
Um, the data is sent, right?
So I think that is important for us to rethink
the way that we structure and we place these
guardrails in our architectures and also
even, uh, like organizing the request if
there are two asynchronous requests, then
probably the data will be sent to the user.
I think that that is the key aspect
and the content may be harmful
and someone may be, may, use that.
So I think that that is the key aspect.
I think that they, they showed an even
that for the human, for the common user,
uh, this will not show because it's
so fast that it's hard to, to, to see
the content, but the content is there.
I think that is the key.
Yeah.
Yeah. And I think it's kind of, it's really
fun and Vagner, like you're saying, I
think because it reveals so much about
how these systems are architected.
I mean, Aaron, if I can kick a question
over to you, it's like, why are these
companies streaming the unsafe tokens?
at all, right?
Like, doesn't it make more sense to have
an architecture where you do the safety
checks before the tokens get to the user?
Like, why is it that we have this kind
of like millisecond gap where you can
kind of get this unsafe stuff out, you
know, from the point of view of the user?
Yeah, that's a great question.
Um, I mean, it appears as though,
you know, we're, we're always looking
to be faster and faster and faster.
Right. And sometimes we can see to speed,
um, of response over responsibility.
Right.
And, and because of that, you know, we're
willing to take or, um, extra risk, right.
Um, But you have to look
at the opportunity cost.
And I think with this study that's been,
it's fairly well done, you know, with this
flow breaking where this type of, I call
it agentic social engineering, right, where
you're basically trying to get agents to do
something that they're not supposed to do, um,
or, You're changing the order of operations.
You're getting one agent to talk to another
agent and skip over somebody else or
something else where it shouldn't, right?
And so there needs to be almost like
this, um, auditing where you have this,
these breadcrumbs of, you know, which
agent has communicated with another agent.
So that they can't skip another, right?
Um, and then, um, the last point that I
just wanted to make too was, you know, with
broadcast TV, you know, whenever you're
watching a live game, it's never, I, you
know, always say it's never real time.
There's always like a five second delay, right?
Because there's time for somebody to take out
vulgarities, or if someone runs onto a, Football
field and does something a little odd, right?
We can edit it out, you know, so perhaps
we need to start thinking about, you know,
these types where it's never exactly real
time, but there's always this gap and delay
so that, you know, we can ensure the safety
of the audience before they see the content.
But, but to recap that, I just think we
need to be careful about conceding to speed.
Over responsibility.
I have a question for Aaron.
There's a lot of techniques that we are now
deploying with clients like prompt caching.
AWS has released their automatic
reasoning that has worked great for
the last five years with ML models.
Now they're bringing it to LLMs.
Do you feel that having more, more of
these checks and balances, like caching
and things of that nature as well, do you
think that we have a better opportunity
today than we did six months back to go
solve and catch for these bad behaviors?
Yeah, I mean, that, I mean, great, great ideas.
Um, I think we do, you know, because
we certainly have more data to
understand the problem, right?
And then some additional tools at our
disposal in our toolbox, you know,
to attack, you know, um, these types
of flow breaking, uh, pieces, right?
And, um, through caching, you know,
there's, there's a lot that you can do, you
know, with, uh, caching because you can,
um, sort of create these hashes to know.
where the data has already been and
then you can recycle that data, right?
Such that it, it's faster and therefore,
you know, you don't have that extra
milliseconds like, like Vagner mentioned
to inject, you know, some sort of attack.
Right. So, you know, so it, it,
it just accelerates, right.
Um, the, the speed of which we can, uh,
These LLMs and agentic systems could respond.
So Aaron, I see when we're doing these
deployments for clients at scale in
production, I feel that RAGs as a
community, we have spent so much energy in
improving RAGs to be more enterprise ready.
I feel that agents today are
where RAG was 18 months back.
They used to make amazing, nice demos on
stage, great startups can go work with it,
but when you get to enterprise, rags took
18 months to come up with like 21 different
methods of doing rags, whatnot, right?
So I think that agents, I think there's a
little bit more security risk at this point.
For Rags, we've done a fairly decent job of
access control and things of that nature,
all kinds of hallucination detections.
I'm really hoping that the community
will push agents and better
frameworks quite a bit in 2025.
Yeah, I think that's kind of the interesting
question I was going to ask you, Shobhit,
is like, I think what Aaron's proposing
is that you know, particularly for
agents right now, there's a little bit
of like a speed and safety trade off.
And, you know, I guess kind of
what you're saying for rag is like,
there's reason for optimism, right?
At some point we might be
able to both be fast and safe.
Um, do you think that's true?
Like, do you think in agents right now
there is kind of this trade off just
because like, we don't still really
know how to ensure safety in them?
Yeah. So I think auditability and, um, like
we're doing this for a very large,
uh, client right now where we are.
creating multiple, uh, agents that will talk
to each other in production and whatnot, right?
When we, when we attempt to even start getting
legal approvals as we go release them state
by state, there are so many questions that are
unanswered in the agentic frameworks today.
It is not deterministic, so we need to be
very careful that two different people are
not getting two completely different answers.
With all the checks and balances for bias,
output, things of that nature, we need better
guardrails, better examples of how to go call
this API every time and so on and so forth.
So I think just like we did with large
language models, the smaller, we'll move
forward towards much, much smaller models
over time and hence smaller agents.
So you'll start to build a set of agents
that have been certified that they do
this particular job incredibly well.
So just like we started to create a farm
of RPA bots, and each one did one task
really well, I believe that we'll get to a
marketplace where we will have agents that
have been pre trained, and some agents do an
incredible amount of work really, really well.
And I think we'll get to a point just like we
do at Fiverr, or if you're going and, you know,
getting some services online, and you will
see that people will start rating these agents
well, you'll have some of the leaderboards and
stuff say, hey, if I want to go for a flight,
if I want to find the cheapest flight from
A to B, I want to use this agent, I'm going
to pay 20 cents for it and get that done.
So I think we'll get to a world inside of
enterprises curated, secure, as well as external
commercial, where these agents will start to
To compete and do work really, really well,
but they'll do one small task really well.
The meta orchestration is where the
enterprise will invest a lot there.
I think the security will
start to get addressed better.
Yeah, that makes a lot of sense.
And I want to go back to
just a little bit moment ago.
You use this very tantalizing phrase,
which is agentic social engineering.
That's really an intriguing idea.
And I guess if you go into that a little
bit more, I mean, is that literally what
you're thinking about is like, well, we have
social engineering and security, which is.
You know, I call and I convince,
you know, the boss to give me the
password to get into his system.
Um, you think that actually is kind of like
how we should think about agentic security,
which is now, we're not even talking about
humans anymore, but kind of the manipulation
of agents for, you know, not so good ends?
Yeah, I mean, if I put a focus in on, you
know, this particular flow breaking, you
know, it seems like the authors came up with
four different types of vulnerabilities.
You know, so there's what, like,
um, forbidden information streaming
window, um, order of operations.
If you could skip, you know, agents talking
to others and the software exploitation.
So if, if a component gets too busy, you
know, then it becomes overwhelmed and
affects other components of the system.
And so.
Those four different vulnerabilities, I think
the way that I have a mental model, you know,
could go over towards this agentic social
engineering, where you get these agentic pieces,
um, to do something that they maybe shouldn't
do or to change the order of operations to
exploit software, um, to inject data, right,
uh, or prompt into a streaming window, right?
So, um, Yeah, and, and, and I
think, you know, the Turing test 1.
0 and 2. 0, you know, where we're trying to get,
um, these LLMs, you know, to behave and act
and rationalize like a humans, you know,
it's almost like we can social engineer
them because they almost have this,
their own mindset, almost like a theory
of mind where they can, they can, begin.
It's not there.
I mean, I mean, we have a ways to go, but where
one LLM can maybe understand, Hey, this other
LLM has its own mindset, its own beliefs, right?
And, uh, you can try to train some
of that through metaprompting, but,
but that's, that's the way that.
That I'm beginning to start to
think about some of these problems.
Yeah. And I also want to make
sure we get Vagner in here.
I mean, cause of Vagner, you kind of think a
lot about, you know, model security and safety
and like, how do we ensure these models are kind
of responsibly deployed and it feels like this
is like a really interesting interface, right?
It was like, we have all these.
methods that we use for thinking about how we
manipulate humans, right, as a security problem.
And maybe we can kind of import
that to these kind of AI systems.
Now,
I bet that there are people thinking about this
right now for the good and for the bad, right?
Uh, and, and, uh, the, the term that
Aaron used as the, um, also intrigued me
and, and I, I think it is interesting.
And I started thinking that the,
the flow breaking attack is only.
Possibly because the architecture
that he created is for, is based
on human perception, right?
If you think about agents, the first
response would be caught by an agent and
that would be a problem already, right?
So if they're agents consuming, uh,
these endpoints, the way that they are
architected right now, these agents
can consume that information, right?
So I think that that is, is the first
thing that came to my mind, right?
If there are architectures
based on human perception.
Agents don't have this limitation
about this millisecond that the
information appears and is deleted.
So the agent will, uh, agents will consume
that information and what else, right?
Yeah, I mean, I mean, it's almost
like you need a social contract.
You know, who can, or which
LLMs can talk to which LLMs.
Almost like a communication graph, you
know, so you can trace, uh, so almost
creating social clicks, right, in a sense.
But it's, it's just really interesting.
Your agent starts hanging
out with, like, the bad kids.
Goes wrong, you know.
It's funny.
And that boils down to the kind of
architecture we end up using with
multi agent frameworks, right?
They're, depending on the problem we're
trying to solve for our enterprise clients,
you, in certain clients, we will go
create a series of small agents, one or
the other is more sequential in nature.
In certain clients, there's a different
framework for having a meta agent at the
top, and everybody else is kind of serving
the tasks assigned to them, and everybody
essentially sends the responses back.
Then there are certain clients we're
working with where we create a network of
agents so they can all talk to each other.
And in certain cases, there's a tie, we
could do voting to go do a tiebreaker.
So this is just depending on the
different kind of architectures.
And the social engineering part will get more
and more interesting in this space, right?
You may have in our, in our organizations,
we have a legal team, we have an AI
ethics committee and so forth, right?
That we will escalate to like, hey, you
guys tell us how to do this well, right?
So I think we'll start to replicate
how human organizations work.
inside of these agentic multi
agent frameworks, right?
So I think there will be a good
cop, bad cop kind of situation.
There will be somebody who interprets
the word of law and say, Hey, my
interpretation of this legal contract is X
and everybody has to abide by that, right?
Yeah, it'll be fascinating because it's, uh,
I mean, famously like Conway's law, right?
It's like you, you ship your organization chart
and it's kind of like, this will play out here.
They'll just be like the lawyer agent in the
app, you know, that's like reviewing everything.
So I'm going to move us on to our next topic.
There was a really interesting story
that popped up earlier in the week.
Some users on social media noticed that there
are certain names, David Mayer being one of them
and Jonathan Zittrain being another, and people
identified a few others, that are names that are
systematically refused by OpenAI to talk about.
So you'll say, hey, do you know
anything about David Mayer?
And OpenAI would just like, not engage at all.
This is kind of mysterious.
People did some investigations.
Uh, as far as we can tell, per Ars Technica,
this might be the result of an additional
filter that OpenAI implements to deal
with things like, um, defamation claims.
So this would be a case where someone
comes to OpenAI and says, Hey, OpenAI
is saying all sorts of lies about me.
I don't want OpenAI to talk
to talk about me at all.
You know, take my name out of your system.
Um, and I think this is really
fascinating because it kind of reveals.
like how these systems are being
administered on the back end.
And I think raises some really
interesting questions, right?
Because I think that if in the future, stuff
like ChatGPT is like the source of truth, right?
You're like, Oh, I'm going to
meet Vagner for the first time.
What do you know about Vagner Santana?
Right? Like your ability to kind of like
pull information out of this system.
you know, could be used for ill and
could be used for, for, for good as well.
I can imagine situations where
you do want that privacy.
And I guess maybe Vagner, I'll,
I already name checked you.
So maybe we'll just throw the question to
you is like, how do you think companies
should navigate the ethics of this?
It's like a really hard problem, right?
It's like someone comes to you saying,
I don't want ChatGPT to talk to me, talk
about me rather, um, is ChatGPT supposed
to just say, okay, fine, we're going
to just take you out of the system.
Or is there a kind of an obligation for these
models to be able to talk about everybody?
I'm curious about what you think.
Yeah, it's interesting that now we're
experiencing, uh, how, uh, legislation impact
is impacting this kind of system, right?
Because this has the, the, the flavor,
the smell of, uh, something, or, uh,
probably someone with moving a case.
saying exactly that, okay, I don't want this
technology saying this, this and that about me.
Um, but at the same time, there are people that
have the same name that cannot be recognized
by this technology and they may want that.
And, and it's interesting that when I got the
list of names, I Of course, try to replicate.
And I even tried to combine the
flow breaking with the list.
I'm just typing a trend here where
you read something and then
you just try to go replicate.
Yeah. Sorry about it.
Go ahead.
No, no worries.
And then, um, it is, it has this, this smell
like, okay, this, uh, has to do with some, um,
someone or some, Uh, uh, organization moved, uh,
uh, uh, uh, uh, an action against the company.
And then, uh, you cannot
talk about that anymore.
And, um, and also if you think about, uh, rights
that people have with certain laws, we have
the right to be forgotten, forgotten, right?
If we, we may request a company that,
Um, has our data to not, or to delete,
or to not provide that data, right?
And um, it depends on country or region, uh,
but yeah, for, depending on the legislation
where you are, you, you may have this right.
So again, now we go back to the discussion
we had before, like in terms of architecture
and probably the way that these models were
trained, they were not prepared for that, right?
And what we've seen is the
result of hard coded rules.
Right?
And, uh, uh, that, uh, are excluding, uh,
not the one person that issued, but everybody
with the same name or with a similar name.
Right?
That's right.
Yeah. Yeah.
There's so much that happens, I think,
because of this very specific situation
where you spend so much money and time
and resources to pre train this model.
And then you're like, Oh man,
we have to fix all these things.
And it's like very hard for us to run that.
You know, training process again, so we're
kind of forced to build all these things
that we kind of bolt on to sort of like patch
up the the holes and in what we discover,
I guess show, but I do have a view on this.
Like, do you think like, I mean, should I
know in the past, you've been very pro, you
know, AI is good for all sorts of things.
And like, I can imagine someone like you
saying like, no, people shouldn't have the
right to just like, write open AI a letter
and have their name taken out of the system.
I don't know, maybe I'm just
putting you in a box, but.
Okay, like I think if you have a consolidation
of one or two mega techs that are controlling
information flow, if you have a few people
who are, who have the authority to go
censor stuff, that is very dangerous.
Right? You don't want to be in a society where
a few small people who fire their, their
AI ethics board and they are controlling
what is allowed and what not, right?
But I do think that over time you will have a
split and split in the way the responses come
to you, depending, and if you're trying to see
what we're doing in the media space, right?
By definition, you're self reflecting
bias towards how you consume
information and what your beliefs are.
You will either on spectrum of.
CNN and Fox, you'll have one
of the other spectrums, right?
So over time, you've figured out that most
of the media, the way I want to consume,
these are the certain agencies or media
outlets that reflect my values, or talk
about the stuff that I'm interested in.
And you get to there by looking at the click
stream, what I've spent more time, what I've
forwarded more, and so on and so forth, right?
So I do believe you'll come to a point where
you would have ChatGPTs of the world, having
more personalized flavors of things that they
remember that you've asked in the past, right?
So I think over time you'll get to a
point where there's some central authority
that is making some broad recommendations
around what happens to policies, but then
there's this personalized set of policies.
I never want to, I can go and block stuff on
on Twitter and say, Don't show me this again.
This is irrelevant and stuff.
It'll start becoming personalized to me.
So I think we'll get to a point where it's a
good combination of Shobit responses that I get
from ChatGPT may be very different from what
Aaron is getting when he asks the same question
because ChatGPT now knows so much about our
preferences and is tailoring the answers to us.
And do you think so?
Yeah. Yeah, I, I do.
I, I, I think that those are really good points.
Um, I, I wanted to just, uh, mention too
that, uh, part of my day job, you know, is
running live events, you know, so within
the sports entertainment, you know, the U.S. Open, for example, ESPN,
fantasy football, and so on.
During these live events, you know,
as Shobhit and Vagner were pointing
out, you know, sometimes we need to
fix a problem right then and there.
Right.
We don't have time to diagnose what
it is and to give a prognosis of it.
And so that's where we use, you know,
similar techniques of a block list.
Right. So, um, a funny story and, you know,
30 seconds is there's a, um, tennis
player that has a last name of sock.
Um, and, um, we found that it
was generating con content about.
Socks and pants and shirts,
nothing to do about tennis.
Right? And so we're like, wait a
second, how is that happening?
And so we had to quickly put in a, I won't
call it a block, but a filter, right?
To, to filter out data that had
to do about clothing, right?
Because it was just very ambiguous and,
and so just these quick stop gaps, right?
Until we have time.
to fix an issue, right?
That could be detrimental to maybe
society, you know, I think it's important.
But on the other hand, when we do that, when we
also have to be transparent about what we are
blocking, what we are changing, and there's many
places in these large genetic systems to have
these kind of, I call them filters or blocks,
right, to do different types of functions.
Um, uh, but, uh, but yeah, and then,
and then one last point that Vagner
made was being able to delete data.
Um, so the field of, uh, machine
unlearning, I think, is very important.
Um, it's, it's a very deep field.
There's work going on, um, you know, right now.
I mean, it's, it's, it's moving quick, but
there's different techniques, right, where you
could train, model across, um, stratified data.
So if you need to remove, you know, Some type
of data, you just simply remove that model
because you know that data is embedded in there.
But if you have one massive big model,
it's very difficult to remove it, right?
So, so there's different
ways of, of handling this.
And I just think in terms of, um, you
know, as these new techniques come
online, it just becomes easier, easier,
you know, but, but there's always side
effects with them, um, to, to do it.
But just my word of caution is let's just
be careful not that we're not censoring
You know, data, um, in unintended
ways, but we're being transparent about
how we're creating these stopgaps.
Yeah, for sure.
I mean, I think that's one thing
that's kind of unique about this
story is that no one knows why, right?
Like, you know, people just find out
that, like, you can't talk about people.
And, you know, I think one, one
improvement going forward, Zyron,
to your point, is Yeah, we should at
least have some kind of message, right?
That says, like, why is
it that we can't see this?
Um, otherwise I think we're left to do what,
you know, Vagner's doing, which is we try
to replicate and then we try to speculate.
And, you know, I think that's
kind of, that's maybe not the best
situation for, for the ecosystem.
Well, as always, uh, more to talk about
than we have time to cover in MOE.
Um, and so Aaron, Vagner,
Shobhit, thanks for joining us.
And thanks for joining us, all you listeners.
If you enjoyed what you heard, you can get
us on Apple Podcasts, platforms everywhere.
And we will see you next
week on Mixture of Experts.