AI Agents: Hype vs Reality
Key Points
- Andrej Karpathy (co‑founder of OpenAI) sparked controversy by claiming that “useful agents are a decade away,” emphasizing current agents’ lack of memory, robustness, and reliability.
- His perspective comes from leading cutting‑edge AI research (e.g., his recent Nano‑Chat release), which differs from the day‑to‑day experience of builders using off‑the‑shelf tools.
- He argues that any reliability or robustness we see in multi‑agent systems today is derived from architectural design rather than from the agents themselves.
- Despite these limitations, agents already deliver strong ROI and real‑world value, so developers can and should build useful applications now.
- The key takeaway for builders is to focus on solid system architecture to compensate for agent shortcomings rather than waiting for “perfect” agents to arrive.
Sections
- Untitled Section
- Challenges and Progress in LLM Training - The speaker discusses the difficulty of supervising large language models—requiring diverse user data, grappling with crude reinforcement‑learning signals and credit‑assignment problems—while acknowledging that despite these hurdles LLMs continue to deliver remarkable results and drive ongoing technological advancement.
- Self‑Driving Limits & Incremental AI - The speaker highlights that despite flashy demos, truly general autonomous vehicles don’t exist yet—city‑specific models are brittle and rollout is incremental—drawing a parallel to AI development’s piece‑by‑piece approach and noting the promise of personalized AI tutors in education.
- Embracing Continuity Over AI Panic - The speaker counters hostile AI narratives by outlining four overlooked insights from Andre’s piece—especially the value of steady, incremental growth and a calm, long‑term approach to solving complex agent problems—advocating continuity instead of disruptive rupture.
- Memory Engineering for LLM Agents - Andre argues that durable, reliable memory is essential for LLM agents to emulate human learning trajectories, and solving this core memory problem will unlock broader AI capabilities, prompting a focus on memory architecture, updates, permissions, and evolutionary analogies.
- Relaxed Announcement About Upcoming Post - The speaker notes they've completed a full write‑up, advises not to panic, and hints they'll wait for the next Silicon Valley post to go viral.
Full Transcript
# AI Agents: Hype vs Reality **Source:** [https://www.youtube.com/watch?v=5ioEQigrJOA](https://www.youtube.com/watch?v=5ioEQigrJOA) **Duration:** 00:20:25 ## Summary - Andrej Karpathy (co‑founder of OpenAI) sparked controversy by claiming that “useful agents are a decade away,” emphasizing current agents’ lack of memory, robustness, and reliability. - His perspective comes from leading cutting‑edge AI research (e.g., his recent Nano‑Chat release), which differs from the day‑to‑day experience of builders using off‑the‑shelf tools. - He argues that any reliability or robustness we see in multi‑agent systems today is derived from architectural design rather than from the agents themselves. - Despite these limitations, agents already deliver strong ROI and real‑world value, so developers can and should build useful applications now. - The key takeaway for builders is to focus on solid system architecture to compensate for agent shortcomings rather than waiting for “perfect” agents to arrive. ## Sections - [00:00:00](https://www.youtube.com/watch?v=5ioEQigrJOA&t=0s) **Untitled Section** - - [00:04:05](https://www.youtube.com/watch?v=5ioEQigrJOA&t=245s) **Challenges and Progress in LLM Training** - The speaker discusses the difficulty of supervising large language models—requiring diverse user data, grappling with crude reinforcement‑learning signals and credit‑assignment problems—while acknowledging that despite these hurdles LLMs continue to deliver remarkable results and drive ongoing technological advancement. - [00:08:40](https://www.youtube.com/watch?v=5ioEQigrJOA&t=520s) **Self‑Driving Limits & Incremental AI** - The speaker highlights that despite flashy demos, truly general autonomous vehicles don’t exist yet—city‑specific models are brittle and rollout is incremental—drawing a parallel to AI development’s piece‑by‑piece approach and noting the promise of personalized AI tutors in education. - [00:12:26](https://www.youtube.com/watch?v=5ioEQigrJOA&t=746s) **Embracing Continuity Over AI Panic** - The speaker counters hostile AI narratives by outlining four overlooked insights from Andre’s piece—especially the value of steady, incremental growth and a calm, long‑term approach to solving complex agent problems—advocating continuity instead of disruptive rupture. - [00:16:35](https://www.youtube.com/watch?v=5ioEQigrJOA&t=995s) **Memory Engineering for LLM Agents** - Andre argues that durable, reliable memory is essential for LLM agents to emulate human learning trajectories, and solving this core memory problem will unlock broader AI capabilities, prompting a focus on memory architecture, updates, permissions, and evolutionary analogies. - [00:20:18](https://www.youtube.com/watch?v=5ioEQigrJOA&t=1218s) **Relaxed Announcement About Upcoming Post** - The speaker notes they've completed a full write‑up, advises not to panic, and hints they'll wait for the next Silicon Valley post to go viral. ## Full Transcript
Silicon Valley has been exploding for
days over Andre Carpath's podcast with
Dwarcash. I want to go into why it was
so controversial and now that the dust
has settled, what the real takeaways are
for builders, for people who want to
work with AI in the here and now. So,
the first thing to do is to understand
where Andre is coming from. Andre is one
of the co-founders of Open AI. He's
someone who's been on the cutting edge
of AI systems for a long time. He just
released nano chat which is a new tiny
way to train your own GPT. It's great.
But in that world where he is on the
cutting edge, you have to understand
everything he says from that frame of
reference. And that's going to be
something I come back to because it's
quite a different frame of reference
from being in the trenches as a
practitioner or a builder using existing
AI tools. What did Andre say? So number
one, the first thing he called out is
that useful agents are a decade away.
That was the title of the episode. And
what he's saying essentially is that
current agents lack memory. They lack
robustness and they lack reliability. He
used the word slop and people just
jumped onto that in a way that I think
even he didn't expect. And that is part
of what drove all of this controversy.
But in a way, like if you look at it,
he's right. Agents don't inherently
remember and learn. We have to teach
them everything they know. agents are
not particularly robust. The most
sophisticated multi-agent
implementations I've been a part of tend
to have robustness from architecture
rather than robustness from agents
themselves. And if you want reliability,
again, you go back to architecture for
reliability versus the agent itself.
None of this as a builder, this is me
talking, prevents you having really high
value use cases for agents today. And I
tend to frame the work we have to do
architecturally to make agents work as
just the price that you pay for where
agents are at. And the ROI is there
because agents are able to do so much
already. But the promise of agents has
been much bigger than this. The promise
of agents, which I think Andre is
reacting to, is that they will do
anything, that they will be anywhere,
that they will learn, they will be
useful as they stand out of the box,
that they will remember everything. And
of course, if you've ever worked with an
agent, you know that's not true. And
Andre is saying it's not true and he's
right. And so I think within Andre's
frame of reference saying that really
good agents that have memory, that are
robust, that are super reliable, that
don't need architecture in order to do
complicated tasks, yeah, that does feel
like it's a decade away, right? Like
that's not necessarily right around the
corner. The thing that I think from his
perspective he didn't feel the need to
emphasize is that we can already get
value out of them today. And I think
that's the piece that I wish had come
through a bit more in the podcast. There
are companies saving on the order of
hundreds of millions of dollars a year
using AI agents today. Not next year,
not the year after, not in a decade,
today. Do those agents have struggles
with memory robustness and reliability?
Andre is correct. You have to account
for that in your architecture. A lot of
what I teach people when I teach about
agents is how you architect for the
agents we have today. Doesn't mean they
don't have value. And so the irony is I
can agree with Andre that from his
perspective maybe agents are sloppy but
they can still add a tremendous amount
of value as they are today. The second
thing that he called out the second big
conversational theme is that LLMs have
cognitive deficits and that we have
trouble driving effective pre-training
dynamics with LLMs. This gets kind of
into the weeds for those of you who are
not super technical. I want to make it
as clear as I can. Fundamentally,
pre-training is a really, really, really
tough way to learn. All you get is a
single signal whether something is right
or wrong. And so, if you're training a
model, all you get is is this a yes or a
no. Is this essay that I wrote good or
bad? There is no room for any kind of
nuanced feedback. It's a known issue.
And that is part of why you have to have
so many different kinds of responses
from many different users during
pre-training to get any kind of
approximated learning. Andre is right
when he says that this model is really
hard to work with and you're sucking
supervision bits through a straw. That's
his words. I think he's correct. Like
it's a tough model to work with. I think
the counterpoint to that is that as
tough as it's been, it has delivered
remarkable results. If we stopped LLM
progress today, which there's not a sign
of, we would still have more than a
decade of technological progress in
order to fully bake in everything that
we already have. And so, as much as I
agree that LLMs don't learn like humans,
and it is really tough to give them
supervision through pre-training, and
there are issues we're going to have to
solve, none of that gets in the way from
a builder's perspective at what we can
do right now. The third thing he called
out is that reinforcement learning is
absolutely terrible, but he can't think
of a better option right now. And I
think he's calling for the industry on
the cutting edge to think about this.
This is what I'm trying to get at when I
talk about this idea of credit
assignment, right? Where you have a yes
or a no. It's such a blunt instrument
that it's really, really, really, really
hard to get it to work correctly. The
conversation then moved over to economic
growth. There's been a lot of
assumptions about artificial general
intelligence driving either, you know,
the doomsdayers, say the end of the
world, or a period of unprecedented
economic growth, say the optimists. And
one of the things that Andre called out
is that his base case, his assumption is
that humans have been baking in a
tremendous amount of innovation into our
baseline 2% uh domestic product growth
over the last several decades. and his
current assessment is that AGI will
blend in to the current trend of
automation and will not see a shift in
baseline. So, he's not saying it's the
end of the world. He's not a doomster,
but he's also not saying there's going
to be a step function in growth. That
also got a lot of controversy. But I see
where he's coming from. I think one of
the things that we've struggled with as
a discipline is we still have no answer
for the fact that our lives changed
profoundly in the 90s with the advent of
the internet and the personal computer
and that never ever really showed up in
the gross domestic product growth data.
Similarly, the mobile phone and the
invention of the whole social web never
showed up in GDP data. And I think that
what Andre is challenging us with and
and I think this is useful is he's
challenging us not to expect miracles.
Don't expect doom, but also don't
necessarily expect that everything will
suddenly be solved. This is in stark
counterpoint to some of the more
dramatic predictions that have come from
particularly the team at Anthropic
lately who have been on the record
saying that they expect very dramatic
shifts in employment, in coding, etc.
Andre is just not seeing that. He's
seeing this as part of the ongoing story
of technological innovation and that we
are writing the next chapter with
artificial intelligence over this decade
and that even though it may feel like a
profound shift to us, it may not show up
in those economic statistics as plus 8%
GDP growth. Right? From a builder's
perspective, the takeaway I have is that
we should not expect miracles when we
are trying to plan our systems. I think
we do much better planning our systems
when we just have a gradualist case and
it's super basic and we can just move on
with building the system and not
worrying about whether or not we're
building toward nirvana or doom. It's
much more useful to just try and build
something today out of the systems we
have and we'll get gradually better
systems over time which is something
Andre affirms. One of the things he
talks about a fair bit it's it's sort of
a lengthy diversion is a conversation
about self-driving and you wonder like
why is this coming up? Well,
self-driving is an example of how
difficult it is to teach AI a real world
skill. And I've been thinking about this
for a while. It was fun to hear Andre's
take. Essentially, self-driving has
almost infinite edge cases. That is why
when Whimo comes to a new city, it
cannot just put the cars on the road. It
has to learn the entire city because
every corner is unique. And so what
Andre is saying is that getting to
self-driving is still a rocky road
because we have to learn these lessons
about edge cases, data, data, safety,
how we transfer all of that to agents.
And he wants to make sure that we
understand that even though we've had
flashy demos on self-driving in most
cities around the world, there are zero
self-driving cars, even though you can
go to parts of San Francisco and get
them today. And he's calling out the gap
there is around those same things he
emphasized at the sort of top of the
chat right around memory robustness and
reliability. You cannot generalize a
Whimo driving agent to any city. You
have to custom train it and that is
brittle and that is tough and he's right
to call that out as an issue. At the
same time as a builder, Whimo is not
stopping roll out, right? Whimo has like
a half a dozen or 10 cities they're
trying to roll out to right now. driving
continues to get solved over time and we
are doing something really similar on
the AI side where we're just biting off
pieces of the problem and going after
it. And that's an area where I think
Andre and I agree given the pace of the
kinds of problems we're solving and how
quickly we're solving them. If you want
a world where you have a truly generally
intelligent agent that can do absolutely
anything with robustness and reliability
and it doesn't need architecture to
support train structure and scaffolding
might take 10 years. He might be right
about that. The last thing that he
talked about that I want to call out is
a conversational theme around education.
He talked about the idea that
personalization and AI tutors are super
promising for helping people to learn
what they need to with the caveat that
we have some challenges around memory
that we need to address. And this is
something I've called out for a while on
this podcast that I do. Memory is not an
easy problem to solve. Memory is a tough
problem. Memory with AI doing it well is
not easy. I broke down why uh on a video
not too long ago. If you want to teach
people usefully, one of the things you
need to do is to be extremely good at
incrementing the next lesson in a way
that is useful based on the agents
memory of the students interaction with
the material. It's a complicated task
and you have to make sure that you are
ready to give the agent that
responsibility. And I think one of the
things that I'm really curious to see, I
know there's a number of initiatives
going on right now around education and
AI. I want to get into the weeds. I want
to understand better how education and
AI are solving issues of memory when it
comes to learning from students and how
we're able to do that in a way that is
responsible, respectful, privacy first,
but also learns from the student. It's a
real challenge and I think Andre was
right to call it out, but it's also a
real opportunity and he recognized that
as well. Let's jump to the reactions.
The reactions were almost uniformly
terrible. The headlines picked up the
most sensationalist take like agents are
slop, AGI is a decade away. And they
framed it as popping the bubble of AI in
Silicon Valley or as rebuttal of
near-term artificial general
intelligence optimism. And I think in
many ways they took Andre's words out of
context. In fact, he sort of suggested
that when he wrote his follow-up post on
Axlater. He did not intend to ignite the
kind of firestorm that he ignited. And I
don't think he realized how much his
words are taken seriously, not just by
people inside Silicon Valley, but by the
world at large because of his stature as
a founder of Open AI or a former founder
of Open AI. I agree. I think the
reaction is way overdone. I think that
there is almost no reason for the kind
of hostility that I saw in the press
toward the Silicon Valley AI community
unless there's that sort of underlying
hostility toward AI that this piece
tapped into. And it's kind of ironic,
right? Because Andre is someone who
helped to build the AI we have today.
He's certainly not anti- AI at all. And
yet, I felt some of that hostility
coming back in or getting tapped into in
the reaction I saw from the press. So,
what's a better way to respond to this?
You understand what he's talked about.
I've given you a few hints as to my
take, but let's ladder this back. I want
to give you four undernotic points that
people aren't talking about from Andre's
piece and just dig into it a little bit
and talk a little bit about my
takeaways. Number one, there is
something rich in what Andre was talking
about around continuity over rupture.
So, the idea is think of continuity in
your business planning as a huristic.
Assume steady compounding. Assume steady
compounding of capabilities. Assume
steady compounding of growth. Assume
steady compounding and not magical step
changes. Assume that the boring
reliability work you do today is going
to be relevant. I think that one of the
things we're really missing is a return
to anchoring on a steady sense of the
future because AI has felt so uncertain.
And one of the things I'm grateful for
that I wish people would talk about more
is that Andre sees a steady sense of the
future. Andre is not panicking. Andre is
actually seeing the problem of agents as
a really hard one that's going to take a
long time to think through and solve
properly if we solve it in its totality,
which is the way he framed it. I'm
looking at it as a builder and saying
for this tiny piece of the world where
we already have agents, wow, we have a
lot we have to do architecturally to
build well. But the good news is Andre
is saying you've got runway to do that,
right? You can build well now.
Continuity over rupture. That is a
discipline you can practice. That is not
living in denial. And I do think I've
heard people say if you believe that
things will just be the same, then
you're living in denial. Absolutely.
There's going to be massive changes
associated with AI. But we will see a
continuity in those changes. We can see
those trends. As an example, is it true
that jobs are evolving because of AI?
And is it true that we can trace those
patterns and project out trends? Also
true. There's a kind of continuity in
the trend line that we can understand.
You can see literally on the graph the
line going up for AI job postings. Yes,
that's continuity too because you can
see that a new industry is forming and
the new industry has new jobs. We have
seen that before. We saw that with every
major technological innovation through
history. With steam, with rail, with
silicon and computers, you see new jobs
forming. Same with AI. It's actually not
that different. The other thing that I
think is not well talked about is that
his reinforcement learning critique is
not anti-reinforcement learning. For
those who are deep in the weeds, there
was a lot of reaction that suggested
Andre doesn't believe in reinforcement
learning anymore. No. And he clarified
this, but like the straw metaphor that
he used is a specific indictment of the
kinds of sparse trajectory level signals
that bleed across all tokens in earlier
versions of reinforcement learning. If
you have richer, finer grain supervision
and you have better memory, you can
start to get to higher quality
reinforcement learning. And if that's
over your head, that's fine. But you can
take it away as he's critiquing the he's
critiquing the lack of signal that you
get when you take these blunt yes or no
instruments and just apply it to the
whole model, which is what I talked
about earlier. But he's saying you can
use the same principle of reinforcement
learning with finer grain supervision,
really high quality data and improved
memory and you're going to get much
better results. And so he's essentially
calling for us to get better at
reinforcement learning and think about
how we do it in a richer way, which I
think is a good challenge. Not something
I have to deal with thankfully, but for
those in the modelmaker community, it's
relevant. The third point that I think
is undercovered is that human learning
and LLM training is not just a data
scale problem. So we've talked about in
the past this idea that if you train
LLMs enough maybe they're going to get
to the point where they can match human
learning. What he challenged is this is
not just a data scale problem. Right?
Andre's point is that without durable
memory agents will not approximate human
learning trajectories. Your agent cannot
learn the way you learn if it cannot
remember like you remember. He came back
to the memory problem. Now, the press
associated this with slop and really was
negative. But I think the more
interesting point here is that the
memory problem is something that Andre
sees at the root of a lot of other
issues. And if we can solve for LLM
remembering in a reliable way, we're
going to correspondingly unlock a lot of
extra power. So much of what I do when I
am advising clients and working with
folks on building agent systems is we
think about memory for the agent. What
memory is needed for this agentic task?
Where does it live? How does it update?
Who touches it? What are the permissions
involved? How does it change over time?
We are dealing with memory engineering.
And what Andre is saying is that that is
a hard thing to do now. He's right. And
if we're going to make it easier, we are
going to have to solve some root
problems with LLMs that remain unsolved
today. And I think that's a fair point.
The fourth thing that I think didn't get
talked about enough is the evolution
analogy that he used. At one point he
talks about the idea with Dwaresh that
DNA is a kind of miraculous compression
that we can compress our entire
existence as humans into this tiny
little DNA strand. And yet somehow we we
come out and we start to learn as babies
and we grow and it becomes this
tremendous compression algorithm, right?
where DNA is incredibly able to build
useful learning creatures. And I think
that one of the things he called out
here that didn't get talked about, that
certainly got buried in all of the slop
conversation, is he said, and this is
not something that everyone agrees with
in Silicon Valley, by the way, but he
said very clearly, we should not use
that analogy, that that is a description
of what humans are, maybe a description
of what animals are, but not a useful
description of LLMs even by metaphor. In
other words, it's not just that LLMs
don't have DNA. It's that we should not
try and mimic that pathway because we
are trying to build useful and
controllable tools. We are not trying to
build animals or creatures. There may be
some people who disagree with him there,
but I think that is a really good point
and I think it's worth making again. We
are trying to build useful controllable
tools and the metaphors that we are
using for most of this end up not being
tool metaphors and we could use that
because we are trying to optimize for
the wrong thing if we're saying we're
building people cuz we're not building
people. So is this the decade of agents?
I would say it is and I think that my
answer is optimistic where in the same
wording the press has picked that up as
pessimistic. We have so much in front of
us to build from an agentic perspective.
We are just getting started. One of the
things that I did as soon as I saw this
is I went back and I'm going to include
this in my writeup. I went back and
looked at how I've written about AI
agents in the past. And I want to pick
out some of the principles of AI agents
that stand the test of time that are in
line with what Andre is talking about
here. And I want to give you a a write
up prompt that helps you to wrestle with
the implications of AI agents in your
current software stack against some of
the principles that Andre is talking
about here. So, is my agent assuming
reliability? Is my agent assuming
continuity? Is my agent dealing with
memory appropriately? I think those are
really interesting questions. We don't
talk about them enough and I felt like
this podcast was a doorway for me to
think about them. So, if you want to
think about them, too, you can dig in.
and I did a whole write up on it. Enjoy.
Don't panic as always. And uh we'll wait
until the next Silicon Valley post
catches on fire.