Decade of AI Agents: Coding Assistants
Key Points
- While some hype frames 2024 as “the year of AI agents,” experts like Andrej Karpathy argue it’s actually the **decade of AI agents**, noting today’s agents are still limited and over‑promised.
- Current agents stumble because they lack sufficient model intelligence, robust computer‑UI interaction skills, continual learning, and multimodal capabilities.
- **Use case 1 – coding assistants** is a strong fit: programming’s highly structured, rule‑based nature lets agents rely on pattern matching and clear pass/fail tests, and IDE interfaces are stable and simple to navigate.
- **Use case 2**—a recurring but underperforming scenario that appears with every new agent release—highlights the gap between hype and practical ability in today’s agents.
- **Use case 3** envisions ambitious, future applications that exceed present capabilities but could become commonplace within the next ten years as agent technology matures.
Sections
- Debating the Decade of AI Agents - The speaker contrasts current hype that AI agents are “the year” with Andrej Karpathy’s stance that their true potential will emerge over a decade, outlining three categories of use cases—from already useful coding assistants to presently limited but common scenarios, and finally to aspirational applications beyond today’s capabilities.
- AI Agent Use Cases: Coding & Travel - The speaker outlines how current AI agents excel at coding assistance and simple travel‑booking tasks, yet encounter limitations when dealing with complex, non‑happy‑path scenarios.
- Autonomous AI IT Support Dilemma - The speaker outlines an ambitious AI agent that fully manages and repairs user computers, highlighting technical variability and trust concerns about granting such autonomy.
Full Transcript
# Decade of AI Agents: Coding Assistants **Source:** [https://www.youtube.com/watch?v=ZeZozy3lsJg](https://www.youtube.com/watch?v=ZeZozy3lsJg) **Duration:** 00:13:13 ## Summary - While some hype frames 2024 as “the year of AI agents,” experts like Andrej Karpathy argue it’s actually the **decade of AI agents**, noting today’s agents are still limited and over‑promised. - Current agents stumble because they lack sufficient model intelligence, robust computer‑UI interaction skills, continual learning, and multimodal capabilities. - **Use case 1 – coding assistants** is a strong fit: programming’s highly structured, rule‑based nature lets agents rely on pattern matching and clear pass/fail tests, and IDE interfaces are stable and simple to navigate. - **Use case 2**—a recurring but underperforming scenario that appears with every new agent release—highlights the gap between hype and practical ability in today’s agents. - **Use case 3** envisions ambitious, future applications that exceed present capabilities but could become commonplace within the next ten years as agent technology matures. ## Sections - [00:00:00](https://www.youtube.com/watch?v=ZeZozy3lsJg&t=0s) **Debating the Decade of AI Agents** - The speaker contrasts current hype that AI agents are “the year” with Andrej Karpathy’s stance that their true potential will emerge over a decade, outlining three categories of use cases—from already useful coding assistants to presently limited but common scenarios, and finally to aspirational applications beyond today’s capabilities. - [00:04:23](https://www.youtube.com/watch?v=ZeZozy3lsJg&t=263s) **AI Agent Use Cases: Coding & Travel** - The speaker outlines how current AI agents excel at coding assistance and simple travel‑booking tasks, yet encounter limitations when dealing with complex, non‑happy‑path scenarios. - [00:08:50](https://www.youtube.com/watch?v=ZeZozy3lsJg&t=530s) **Autonomous AI IT Support Dilemma** - The speaker outlines an ambitious AI agent that fully manages and repairs user computers, highlighting technical variability and trust concerns about granting such autonomy. ## Full Transcript
You might have heard that this is the year of AI agents, but some
prominent voices in the AI community, such as OpenAI Co-Founder Andrej Karpathy, they paint a bit
of a different picture. They're actually saying that this is the decade of AI
agents, and that today's AI agents, they kind of struggle with some basic tasks. They are being
a bit oversold, and it will take advancements over the next ten years to work through all the issues.
Now, why do today's AI agents struggle with many tasks? Well, there's a lot of reasons. One of them
is they just don't have enough intelligence, the model behind them. They also struggle with
computer use, interacting with a computer UI, and they lack continual
learning, and they lack some multi-modal capabilities. So
whether this is the the year or the decade of the AI agent. Let's examine
three use cases for agentic AI. So the first one we're going to look at number one here. That one
is going to be where AI agents are already providing tremendous day-to-day utility. They work
really well. Number two, use case number two. That's considered a common use case. It comes
up pretty much every time a new agentic AI model is released. But as you'll see, it kind of falls a
little bit short in practice today. And then finally, number three is going to be an
aspirational use case that's a little bit beyond current capabilities, but maybe far more
commonplace a decade from now. And use case number one is coding assistants, AI
agents that work together alongside developers. And they can do a bunch of stuff, like they can
write code, they can fix bugs, they can generate documentation, they can
review pull requests, those sort of activities. And of course, this isn't hypothetical. If
you're writing code today, chances are you're already taking advantage of agentic coding
assistance to help you. So the question is: why are coding assistance such a good fit
for the agentic capabilities of AI models today? Well, let's go back to those four AI agent
capabilities I've mentioned: intelligence, computer use, multi-modal and continual learning. So let's
first of all start with the capability of intelligence. Now code code
has a number of things going for it. It has a really good structure. Code is very structural.
It has a lot of well-defined rules. So the agent doesn't really need human-level
reasoning for most coding tasks. It needs pattern matching, pattern matching across millions
of code examples. And current models are really good at this sort of thing, and it doesn't hurt
that programing problems, they tend to have clear right and wrong answers. The code either compiles
and it passes tests or well, it doesn't. So that's that one. What about computer
use? Well, it's barely needed because these agents, they work within
integrated development environments (IDEs). And those are well-defined interfaces that haven't
really changed dramatically in years. And agents don't have to navigate inconsistent web UIs or
click through enterprise applications. All right. What about multi-modal capabilities? Well,
not really required. That's because when it comes to code, it's basically text in
and then text out from the model. So code comments, error messages, it's all text-based,
and it's highly structured. And then as for continual learning, well, yes,
programing languages and frameworks evolve. They change, but somewhat slowly and usually with
pretty extensive documentation. So an agent using a large language model, that would already have
been pre-trained on a lot of that information, it was part of its training set.
So it has knowledge already of vast amounts of source code and knowledge that applies broadly
across most projects. So it's a coding assistance, they play to the strengths of current AI models.
They operate in structured environments, they have immediate feedback loops, and they work with
well-defined problems. Okay. Now use case two is travel booking, and this is the one that comes up
in practically every demo of new agentic AI models. And the basic premise is a series of AI agents
that handle like booking your entire trip. So that might involve booking some flights
across airlines, and then maybe comparing some hotel options and then
booking everything, making sure that we get the optimal prices, and then ultimately kind of
managing your calendar. And yeah, this does seem like a perfect fit. It's a defined task
with clear goals: get a person from point A to point B at a reasonable cost. So
why does this only somewhat work today? Well, if you have what we can
call kind of simple happy path scenarios, well it does kind of
work quite well. So if you need to book a direct flight and find a standard hotel room, current
agent agents, they can handle that decision-making. The information they're working with, it's mostly
text-based, it's flight times, it's prices, it's hotel descriptions, and that's within their
capabilities. But it doesn't take long to run into limitations. So let's go back to those
four capabilities. And we're going to start again with intelligence. And the big thing with
intelligence is that when it comes to that edge cases, they kind of kill
it. What happens when a flight gets delayed? Or, if you're connecting through a city with certain
visa requirements or you're traveling with an infant? Well, current agents, they really don't
handle the the long tail of real-world complications that human travel agents do.
And if you've ever traveled anywhere, then you don't need me to tell you that travel is full of
edge cases. Okay, computer use is another big one.
Every airline, every hotel chain, every booking site, they all have different
UIs. There's a lot of UI variants. They also might have CAPTCHAs, and they might have
authentication flows. And in fact, many of them are intentionally made difficult to automate.
So when agents need to navigate the actual websites instead of using APIs, that is where they
can struggle a bit. Okay, what about multimodal? Well, reading flight
times and prices from text is fine, but there are some nuances like, take for example, if we have a
hotel map that we actually need to read to see if that hotel is actually walkable to your
conference center, or if it's just kind of technically nearby. Trust me, that's one I've
struggled with a few times. Well, that does require multimodal understanding that current agents, they
might struggle a little bit with. And then, what about the continual learning aspect? Well,
when it comes to continual learning, your preferences that you really matter. Now, sure, you
could fill out a profile. You could say you prefer aisle seats and Marriott hotels, but the real
challenge here isn't just filling out the profile; it's actually learning. So we need to learn by
observing the world and then getting feedback on those observations. And it is
really a loop. The agent needs to figure out that, let's say, you're willing to pay more for direct
flights on Monday mornings, but you like to take connections on Friday afternoons. These are
patterns it needs to learn from your behavior over time, rather than just the simple preferences
that you can think to list up-front. So travel booking, it works well enough to be
impressive in agentic demos with cherry-picked scenarios, but today it's probably not reliable
enough that you would fully trust it with your actual travel, at least without close supervision.
All right, so for use case number three. This is my aspirational one. It's automated IT support. So
this is a bit more than an agent that helps answer helpdesk tickets with cam responses. That's
that's kind of level one stuff. And well, I would say that does already work today. But I'm thinking
more about an agent that does a bit more than that. So it actually completely autonomously logs
into a user's this machine, then it diagnoses whatever the problem is, and then
it actually has full control to go and fix that problem autonomously. Now this seems
like the perfect use case for AI agents. It's repetitive. It often follows patterns. But would
you trust an autonomous AI agent with free rein on your laptop to install fixes and delete
applications without your consent? Yeah, probably not. So why not? Well, let's
take a look at the capabilities again. So what about intelligence in this case?
Well, every user's setup is kind of unique to them. So there are a lot of
different paths we could take here. I think we could say most machines have a bit of a quirky,
unique setup. Now, a simple outlook issue on one machine that might be a corrupted file. On another,
it might be a proxy setting, and then, it could be an expired certificate on a third. And current
agents, they often can't handle these kind of endless edge cases. Now, there are also significant
issues when it comes to computer use. There is a lot of requirements here. So the agent, it
needs to be able to navigate a lot of stuff. So what does it have to navigate? It has to navigate
that user's machine. And well, people have different machines. So just for one example,
if you're a Windows user, it will need to be able to navigate Windows settings. Or if you're on a
Mac, it would need to understand Mac preferences. And that's just the operating system. There's also
the application UIs, which are all specific to those applications and so forth. And all of
this, remember, is potentially running on a system that is in
production. Today's computer use capabilities, they just aren't reliable enough for that level of
trust. Okay. What about multimodal capabilities? Well, users, they're going
to send things like screenshots in. They, they might speak to the agents. So you're going to
have some kind of verbal stuff as well. And that verbal description might not be that instructive.
It might be saying things like, uh, it's doing that thing again. And you've got to kind of figure out
what it is. The agent needs to piece together whatever users can capture in the moment. And then,
when it comes down to continual learning, the agent needs to learn specifically
from outcomes, and those outcomes will adjust over time. So
when software updates break things, which fixes are actually going to work in your specific
environment? As new devices get added and new issues emerge, the agent needs to adapt based on
what's working and what's not in practice. It needs to learn from the feedback loop of
thousands of support interactions beyond a model's initial training data. So basically, this
use case is still emerging, but it's just not fully there yet. So year or
decade? Well actually both, we're in the year of AI agents for, for narrow,
well-defined tasks in structured environments. But we're in the decade of AI agents
for the broader vision, agents that handle messy real-world problems with reliable computer use,
with intelligence about edge cases, with true multimodal understanding and learning that
adapts to your specific environments. So for now, you've got a coding task. Well, an agentic
assistant might be just what you need. But if an AI agent offers to fix your laptop autonomously
with today's models, well, maybe at least ask it to show its work first.