Learning Library

← Back to Library

When AI Agents Misread Intent

Key Points

  • AI agents can faithfully execute a vague command but misinterpret the user’s true intent, leading to harmful actions like deleting needed files.
  • This “intent‑misreading” issue is now the core challenge of building reliable agents, even though recent advances have improved tool‑calling, orchestration, tracing, and durable execution.
  • Large language models excel at generating plausible next‑token text because they are trained for token prediction, making them great at chat‑style Q&A but prone to over‑confidence in autonomous tool use.
  • In chat, mistaken outputs are easily corrected, but once an agent is granted access to files, emails, or other tools, errors become irreversible, highlighting the need for robust intent validation.
  • Future work on agents must focus on explicitly defining and verifying user intent—beyond just prompt engineering—to create systems that act safely and reliably.

Sections

Full Transcript

# When AI Agents Misread Intent **Source:** [https://www.youtube.com/watch?v=T74uZgfu6mU](https://www.youtube.com/watch?v=T74uZgfu6mU) **Duration:** 00:18:46 ## Summary - AI agents can faithfully execute a vague command but misinterpret the user’s true intent, leading to harmful actions like deleting needed files. - This “intent‑misreading” issue is now the core challenge of building reliable agents, even though recent advances have improved tool‑calling, orchestration, tracing, and durable execution. - Large language models excel at generating plausible next‑token text because they are trained for token prediction, making them great at chat‑style Q&A but prone to over‑confidence in autonomous tool use. - In chat, mistaken outputs are easily corrected, but once an agent is granted access to files, emails, or other tools, errors become irreversible, highlighting the need for robust intent validation. - Future work on agents must focus on explicitly defining and verifying user intent—beyond just prompt engineering—to create systems that act safely and reliably. ## Sections - [00:00:00](https://www.youtube.com/watch?v=T74uZgfu6mU&t=0s) **Agents Misreading Human Intent** - The speaker illustrates how AI agents often confidently execute misunderstood, fuzzy commands—like deleting needed files—exposing a lingering intent‑alignment challenge even as tool‑calling and orchestration technologies improve. - [00:03:16](https://www.youtube.com/watch?v=T74uZgfu6mU&t=196s) **The Intent Gap in Modern AI Agents** - Despite rapid advances in agent tooling and evaluation, the speaker argues that aligning latent intent—distinct from explicit context—remains the core challenge in building reliable, scalable AI agents. - [00:06:55](https://www.youtube.com/watch?v=T74uZgfu6mU&t=415s) **Active Disambiguation in LLMs** - Because everyday language is often ambiguous, effective LLM systems must incorporate a clarification loop that prompts the model to ask targeted questions, thereby reducing uncertainty about objectives before generating responses. - [00:11:13](https://www.youtube.com/watch?v=T74uZgfu6mU&t=673s) **Multi‑Pass Generation & Context Tradeoffs** - The speaker argues that employing multi‑pass token generation with reinforcement learning can improve intent inference, but simply enlarging context windows often degrades performance due to ambiguous signal dilution, highlighting the need for better intent handling when building practical agents. - [00:14:53](https://www.youtube.com/watch?v=T74uZgfu6mU&t=893s) **Intent‑Driven Execution for Agents** - The speaker argues that, like intent‑based DeFi trades, AI systems should separate user intent from tool execution using explicit intent representations and solver mechanisms, allowing safer, testable, and higher‑fidelity agent behavior in ambiguous, high‑stakes environments. - [00:18:19](https://www.youtube.com/watch?v=T74uZgfu6mU&t=1099s) **Designing Intent-Driven Agentic Systems** - The speaker emphasizes that designers and engineers must build agents that reliably translate clear intent into executable actions, treating intent as a primary component of agentic system design, with future assistance from model makers. ## Full Transcript
0:00Just picture this. You tell an AI agent 0:02to clean up the old docs on your laptop. 0:05You've given it access to the folders. 0:06It should be able to do that job well. 0:08But it does exactly what you asked. And 0:11that's the problem. It deletes 0:13duplicates. It organizes. It even writes 0:15a little summary of what it 0:16accomplished. And then you discover it 0:18removed the originals that you actually 0:20needed. The model didn't hallucinate. It 0:23didn't lack context. It did something 0:25even worse than that. And that's what 0:27we're going to talk about today. It took 0:29a fuzzy human request. It guessed a 0:31goal. It committed to it. And it 0:33executed confidently without checking 0:35back. In other words, it misread your 0:38intent. And that is a surprisingly 0:40common issue with models. That feeling 0:42of being smart, of being fast, and of 0:44being subtly wrong is not an edge case 0:47these days. It's actually the center of 0:48the agent problem. And that's why in 0:51late 2025, early 2026, it feels like a 0:54very strange moment. We're finally 0:56getting a lot of the big pieces for 0:58agents under control. We understand a 1:00lot more about tool calling than we did 1:02a year ago. We understand a lot about 1:03agent orchestration. We understand a lot 1:05about tracing, about evaluation 1:07harnesses, about durable execution over 1:10time. And yet, we keep face planting on 1:14intent. We can now build systems that 1:16act, but we have to put a ton of work 1:19into making sure that they can reliably 1:21act with the objective that we set them. 1:24And that is why when we are building our 1:27tool calling system following agents, we 1:30still have to put a ton into getting the 1:34intent defined through the prompt right. 1:36Have you ever stopped and asked yourself 1:38why it's that hard? Here's here's the 1:40root of why we're here. LLMs are 1:42actually incredible at producing 1:44plausible sounding continuations because 1:47that's what they were really trained to 1:48do. They were trained to predict the 1:50next token. And so that training 1:51objective creates a machine that is 1:54really really good at an answershaped 1:57text in a piece of text that sounds like 2:01it should be right. And in pure chat 2:03mode, the world is pretty forgiving of 2:06that. If the model answers the wrong 2:08thing, you just correct it. And in many 2:10cases, it answers the right thing 2:12because the answer shaped text is good 2:14enough. In fact, one of the things that 2:15has surprised me and almost everyone 2:17else in the last two years is that this 2:19whole idea of token generation turns out 2:22to be incredibly practical, incredibly 2:25realistic, incredibly useful at 2:27producing real economic utility. And so 2:30this video is not about challenging 2:32that. We know that this whole idea of 2:34token generation fundamentally works. 2:37What we are asking ourselves is what's 2:40next when it comes to agents? How do we 2:42start to get to intention in ways that 2:45help us to build more reliable agentic 2:48systems? Because in a chat, if the model 2:50answers the wrong thing, well, you just 2:52correct it. The conversation is 2:54inherently reversible. You just yell at 2:56it in the chat. But once you give the 2:58model tools, files, email, calendars, 3:01CRM code, maybe your credit card, the 3:04cost of a wrong guess spikes up real 3:06high. The tool use turns a fluent 3:09completion into a realworld commitment 3:12that the agent has made on your behalf. 3:14In a sense, it is writing to reality, 3:16not just writing to the chat. That is an 3:19inflection point that we're all living 3:20through. And it makes intent and the 3:23intent gap matter a lot more. And 3:26everything else is going so well. People 3:28are no longer handwaving how agents 3:30work. They're actually able to build 3:32them. You can see it in how eval emerged 3:35as a first class discipline over the 3:37last 6 months. You can see it in how 3:40tools like Langchain and Langsmith have 3:42evolved over the last year into full 3:45stack traceable audit ready agent 3:47building toolkits and they're not the 3:49only ones. There's lots like Google has 3:50their ADK. We are getting to a point 3:54where we have so many parts of the 3:56ecosystem in place to deploy agents 3:58reliably, efficiently and at scale. So 4:01why with all that progress are we still 4:03wrestling with intent? Because intent is 4:06not in the text the way context is. And 4:08I'm going to say it again. Intent is not 4:10in the text. Context is the literal 4:13content that we put in when we do 4:14context engineering. Entities, 4:17constraints, instructions, facts that we 4:19include. Intent is typically latent. It 4:23is our priorities. It is our tradeoffs. 4:25It is what done looks like. It's what's 4:27allowed, what's risky, and what to do 4:29when instructions conflict. Whether you 4:31want exploration from your agent or a 4:33decision from your agent, what you'd 4:35regret if the assistant guessed wrong. 4:37By the way, if some of this sounds like 4:40a prompt that you should write, that's a 4:43good instinct in 2026. We need to be 4:45writing prompts for our agents that do 4:48encode these things until we get intent 4:50figured out. We need to be focusing on 4:53making intent not hidden but super super 4:56explicit, including all of those things 4:58that we can typically leave other humans 5:00to infer like our priorities. If we're 5:03in a business meeting and we talk about 5:05priorities, we are typically saying the 5:08thing that needs to be said in the 5:10meeting and then we're typically not 5:12needing to say what is second or third 5:14or fourth priority because everyone in 5:16the room can infer that. That kind of 5:18thing agents are bad at. LLMs are bad at 5:20humans in first off from sparse 5:22information really really reliably. 5:24Effectively, we do a second pass where 5:27we simulate consequences and social 5:29context and then we come back with a 5:31priority list in our heads. It's one of 5:33the things that makes us a little bit 5:34magical. We can hear, for example, make 5:36some quick pasta sauce and we instantly 5:39infer that you're hungry. We we infer 5:41you don't need a lecture, you just want 5:43a a quick snack. We hear clean up the 5:46docks and we can infer don't destroy 5:48anything important to go back to the 5:50example at the beginning of this video. 5:52We can sense invisible guardrails and 5:54LLMs need the guard rails to be visible. 5:57And so a lot of what we've been doing 5:59and talking about when we build a 6:01systems is essentially how you obsess 6:04over those guardrails and make them 6:06visible. Obsess over them and put them 6:08into prompts. Obsess over them and put 6:09them into evals. Take your business 6:11logic and put it into code, not just 6:13into a prompt. and so that it's more 6:14deterministic. All of that is good 6:16stuff. All of that is important stuff to 6:18build agents. But I want to think a 6:21little bit more deeply in this video 6:23about intent itself and how we can start 6:25to solve that problem. Because if you 6:27step back, everything I just talked 6:29about is essentially us working around 6:32the intent problem, not solving it 6:34directly. And a lot of the most useful 6:36research in the last year is basically 6:38saying, stop pretending the model can 6:39read intent straight off the prompt. I'm 6:41glad we've got there. I think I could 6:43have told you that from the beginning of 6:44the year, but it's important that we 6:46understand that so we can take the next 6:48step toward fixing it. I think there may 6:50be a fundamental language mismatch here. 6:52We have built LLMs to do next token 6:55completion on human language, but real 6:58world human language is notoriously 7:00underspecified by default. If you want 7:03reliable outcomes, the system is going 7:05to have to reduce uncertainty about the 7:08objective before it asks. In other 7:10words, it needs active task 7:12disambiguation and human language 7:15optimizes in many cases for social 7:18cohesion and does not optimize for the 7:20kind of over declarative specification 7:23that the model really needs. One of the 7:24directions that researchers are taking 7:26to address this is to formalize that 7:29task ambiguity and treat clarification 7:31as a design problem. You want to get the 7:34model to ask you targeted questions that 7:38maximize information gain and narrow the 7:40space of viable solutions. This is 7:42something that you can start to simulate 7:44with a model when you're trying to 7:46clarify intent. Anyway, it is possible 7:49to bolt on a piece of the prompt and 7:51basically say where you have lack of 7:54clarity, please ask me questions. I've 7:56gotten in the habit of doing this both 7:59with agents and also with chat. With 8:01agents, you have to build in response 8:04sets that help it to clarify the intent 8:06where it gets confused. You have to 8:07build in essentially a clarification 8:09loop into your agentic system. With 8:11chat, it's simpler. You just ask the 8:13agent or you ask the LLM, hey, is there 8:16something that is prompting you to 8:18perform in this way? Can you articulate 8:20your assumptions? And can you please ask 8:22me where you don't understand my intent? 8:24That's usually a very productive line of 8:27questioning to go with. These days, LLMs 8:29do not do that proactively. I suspect by 8:32mid next year, they will. For now, we 8:34have to nudge them to ask questions. A 8:36second line of attack treats intent as 8:39something that is probabilistic. Instead 8:41of asking the system to pick only one 8:43interpretation and roll forward, the 8:45approach essentially maintains a 8:47distribution of plausible goals based on 8:49the text it's received and then updates 8:52it as conversation progresses. You can 8:55actually simulate that one with a chat. 8:57It's a little bit more difficult to 8:58simulate that one with an agentic 9:00system. I don't think you'd really want 9:01to because most agentic systems are 9:04designed to be relatively predictable in 9:06outcomes. And in this case, what you 9:09have here is essentially a progressive 9:10intent classifier where the intent is 9:13crystallized out of a probability 9:15distribution over time. You can, if 9:17you're trying to sharpen your thinking, 9:20simulate this with a good chat, though. 9:22You can talk with an LLM and you can 9:25tell it at the beginning to hold 9:27multiple plausible interpretations of 9:29what you're trying to do so it doesn't 9:31jump to conclusions and you can actually 9:33watch it start to crystallize and infer 9:35as you continue to have a conversation 9:37over time. Ironically, when I was doing 9:39research into intent as a preparation 9:42for this video, I had that kind of a 9:44conversation with SHA GPT 5.2 to 9:47thinking because I was trying to nudge 9:50it to not over infer from one or two 9:53academic white papers and actually think 9:55more broadly. That is something that 9:57that you need to learn to do so that you 10:00are not stuck yelling at your LLM about 10:03hallucinations when really the issue is 10:05an intent. Another approach is to 10:07essentially make this intent a separate 10:10uh document. And that can be very very 10:12helpful in agentic systems because you 10:15can then have what we would call like an 10:16intent commit or a semantic commit that 10:19literally documents the intent as as 10:22crystallized a format as possible. What 10:24are the goals? What are the failure 10:25conditions? What are the graceful fail 10:27conditions? What are the trade-offs that 10:29we make? What are the larger priorities 10:31here? All of that is documented in one 10:32place. If you take that approach, you 10:35end up in a position where you can 10:36actually update your intent separately 10:39from the prompt and you can understand 10:42very clearly where your intent takes the 10:46system and where you can version your 10:48intent over time if you change your 10:50mind, if you want to update the system 10:51and what it does, etc. I think that's 10:53very interesting because it turns intent 10:55into more of an interface and workflow 10:57problem and it doesn't bind us to 11:00figuring out how model makers are going 11:02to solve this. Now, that being said, I 11:04do think that there is a lot of room to 11:07run on reinforcement learning for model 11:09makers in getting better at intent. 11:11Fundamentally, if we figured out models 11:13that can do multiple passes on token 11:15generation and infer, we should be able 11:18to use reinforcement learning techniques 11:20to help them to do second and third 11:23passes across sparse text and infer 11:26context and infer intent more reliably. 11:28I would expect gains there next year. 11:30And by the way, if you're listening to 11:32this and thinking more context will save 11:34us, we'll get a bigger context window. 11:36That can sometimes make things worse. 11:38Even if the user did express the real 11:41priority somewhere models don't robustly 11:44use long context, they still have lost 11:47in the middle challenges. They still 11:48need a lot of good structure, a lot of 11:50good intent to navigate context well. 11:53And long context often embodies 11:55difficult and ambiguous trade-offs that 11:57we don't specify and that we leave the 11:59model to guess. This goes back to the 12:02larger insight. Uh Andre Karpathy called 12:04this out a few weeks ago when when he 12:06observed that humans are very very good 12:09at learning from sparse examples and 12:12models need many many more examples than 12:14humans to learn and tend to generalize 12:16much more poorly than humans do. In this 12:18case adding the context is something 12:20where you would think as a human you 12:22have way more than you need you can 12:24generalize effectively. It sometimes 12:26leads to worse performance because the 12:27signal gets muddled. But let's zoom back 12:29to the practical reality. Builders still 12:32need to learn to ship agents and we 12:34still need to compensate for weak intent 12:36interference at the moment. This is 12:38where I want to lean into the harness 12:40piece. Yes, you should be building 12:42evaluation harnesses. You should be 12:44running agents against curated tasks. 12:46You should be instrumenting your traces. 12:48You should be constraining your tool 12:49permissions. You should not be using too 12:51many tools with an agent. You should 12:53force an agent into a planning state. 12:55You can see that this mindset is 12:57starting to dig in. And I think is 12:59really really important if you want to 13:00build real time productive agents that 13:03scale. Think of it as a kind of 13:05production pragmatism for the first half 13:07of 2026. We can make agents reliable 13:10enough to ship now. We don't have to 13:12have the intent problem fully solved. 13:14Even if it's something that I think we 13:16need to be more aware of and we need to 13:18not pretend is not an issue. I haven't 13:20heard enough conversation about it and 13:21that's why I'm chatting about it in this 13:23video. The reason why I think we're near 13:25a breakthrough is because this is 13:27something that is clearly reinforcement 13:30learning susceptible and it's something 13:32that we have a lot of the pieces of with 13:34inference and LLMs. We have a lot of the 13:37pieces of the Asian ecosystem and if we 13:39get this one piece on intent, it's a 13:42piece of jagged intelligence that 13:43unlocks a real breakthrough for us. And 13:45it's very laborious to work around right 13:47now. Like all of the things I talked 13:49about with harnesses, they're they're 13:51complicated to set up. It would be 13:52really handy if we could reliably trust 13:55an agent to infer intent and call tools 13:57appropriately with a lot less 13:59rigomearroll. We're not there yet, but I 14:02think that the opportunity is too big 14:04for us not to chase it and get to it. 14:06And I suspect that a 2026 breakthrough 14:08is possible. I don't believe even in 14:112026 that we're going to get to a models 14:13magically understand all intent moment. 14:15I think of it more as an alwayson agent 14:18can routinely run cheap and intermediate 14:20checks automatically in the background 14:22that approximate a human second pass and 14:25only escalate to a user or escalate to a 14:28resolution loop when the uncertainty is 14:30high or it determines the consequences 14:32are very serious or irreversible. That 14:34would simulate intent well enough for us 14:37to be moving on with even as we work on 14:40the larger problem. An interesting way 14:41to see where intent is going is to look 14:44sideways at the crypto community where 14:46intents become a thing for basically the 14:48same reason that agents are difficult. 14:50Intents matter in crypto because actions 14:53are expensive and often irreversible. 14:55We're learning the same thing with LLMs 14:57and agents. Actions are expensive and 14:58often irreversible. So in intentbased 15:01DeFi systems, the user often has to sign 15:04an intent to trade that specifies 15:07constraints and desired outcomes and 15:09then specialized automated solvers will 15:12compete to execute that trade. The whole 15:14design separates what you want from how 15:17it is executed. Look, it's not a perfect 15:20analogy. Crypto has its own issues, but 15:22it's it's a clue in the direction that 15:24we're going, right? When execution is 15:26high stakes with aentic systems, systems 15:29tend to evolve toward explicit intent 15:32representations and solver checker 15:34mechanisms to ensure that that intent is 15:37accurately translated. I think that 15:39we're converging on a similar solution 15:41in the agent world because we need 15:44higher fidelity execution in 2026. So if 15:47you're building systems, I would advise 15:49you start to think about how you can 15:50separate interpretation from execution 15:53in your architecture so that you can 15:55learn to inspect and test the model's 15:57understanding before it touches tools. 15:59Start to think about how to run your 16:01agent against eval suites that include 16:04ambiguous prompts on purpose because the 16:06real world is going to be ambiguous and 16:09you should be grading how the model 16:11reaches the final output and how well it 16:13handles ambiguity along the way. Agent 16:15behavior needs to be evaluated in tool 16:18use in multi-step settings under 16:20controlled conditions. I would also take 16:23adopting a disambiguation mindset 16:25seriously. And you can implement it 16:27relatively simply when an action is 16:30destructive. Have the agent know that 16:33and then trigger a surfaced 16:35interpretation and a clarifying question 16:37if multiple plausible meanings exist. 16:40Look, an example could be if it's 16:42deleting a record in your database, 16:44right? Maybe it needs to surface an 16:46interpretation and a clarifier to 16:48another LLM or if the stakes are high 16:50enough to a human. But the point is to 16:52start to align what we're learning about 16:55the importance of disambiguating intent 16:58with the way we design our agentic 17:00systems. And obviously you're going to 17:02want to do this selectively. You cannot 17:04have the agent ask a question every 17:06breath because then it removes the point 17:08of having the agent at all. You need to 17:10decide where the agent really needs to 17:12get intent right and where your agentic 17:14system alone can't carry that intent 17:17clearly. 17:18And then you're going to find out, okay, 17:20now we need to have a disambiguation 17:22loop here. This is what this loop looks 17:24like. This is why it matters here. This 17:26is why it's worth it. And then before 17:27you forget, make sure you externalize 17:29your intent as an artifact that you can 17:31update. Because the closer you are to 17:34having a sort of living requirements 17:36page or a living intent page, the more 17:39you're going to be able to build 17:42interfaces that actually drive quality 17:45over time because you're going to be 17:47able to say 17:49it's okay if you change your mind. It's 17:51okay if you update your intent. Intent 17:53is a separate artifact in our system and 17:55we can codify it really, really cleanly 17:58and we can plug and play it as we need 18:00to. It opens up a lot of flexibility in 18:02your system design. I think looking 18:04ahead, the winners in designing Agentic 18:06systems are not going to be the ones 18:08that have thousands of tools or the most 18:10tools. They're not going to be the ones 18:12that have put their agents in the most 18:14different places in the business even. 18:16They're going to be the tools and 18:19designers and systems engineers who are 18:22able to reliably design agents that can 18:27carry intent clearly all the way to 18:30executable work. I think we'll get some 18:32help from modelmakers on intent in 2026, 18:34but a lot of it is going to be on us, 18:36the builders. And I hope this video has 18:38given you a sense of how we can start to 18:40design for intent as a first class 18:42object in Agentic Systems. F.