Learning Library

← Back to Library

Anatomy of an AI Agent

Key Points

  • AI agents operate through a three‑stage loop of sensing (receiving data via text, vision, audio, APIs, etc.), thinking (integrating knowledge bases, databases, retrieval‑augmented generation sources, goals, rules, and priorities), and acting (making decisions and executing actions).
  • The sensing layer functions like human perception, turning external inputs—whether typed language, camera feeds, microphone recordings, or event triggers—into raw data the agent can process.
  • During the thinking phase, the agent enriches this data with contextual information from curated knowledge stores and policy specifications, ensuring decisions are grounded in facts, rules, objectives, and priority hierarchies.
  • Reasoning involves applying logical structures (e.g., “if‑then‑else” statements), planning, and task decomposition so the agent can break complex goals into manageable steps before execution.
  • By iteratively cycling through these layers, AI agents translate real‑world information into informed actions, enabling functionalities ranging from chatbots to autonomous research tools and self‑driving cars.

Full Transcript

# Anatomy of an AI Agent **Source:** [https://www.youtube.com/watch?v=CAKGKkWf0tI](https://www.youtube.com/watch?v=CAKGKkWf0tI) **Duration:** 00:10:13 ## Summary - AI agents operate through a three‑stage loop of sensing (receiving data via text, vision, audio, APIs, etc.), thinking (integrating knowledge bases, databases, retrieval‑augmented generation sources, goals, rules, and priorities), and acting (making decisions and executing actions). - The sensing layer functions like human perception, turning external inputs—whether typed language, camera feeds, microphone recordings, or event triggers—into raw data the agent can process. - During the thinking phase, the agent enriches this data with contextual information from curated knowledge stores and policy specifications, ensuring decisions are grounded in facts, rules, objectives, and priority hierarchies. - Reasoning involves applying logical structures (e.g., “if‑then‑else” statements), planning, and task decomposition so the agent can break complex goals into manageable steps before execution. - By iteratively cycling through these layers, AI agents translate real‑world information into informed actions, enabling functionalities ranging from chatbots to autonomous research tools and self‑driving cars. ## Sections - [00:00:00](https://www.youtube.com/watch?v=CAKGKkWf0tI&t=0s) **Anatomy of an AI Agent** - The segment outlines how AI agents perceive data through text, sensors, and APIs, process it with contextual reasoning, and translate decisions into actions. - [00:06:10](https://www.youtube.com/watch?v=CAKGKkWf0tI&t=370s) **AI-Powered Personalized Travel Planning** - The speaker explains how an AI system would fuse sensory inputs, calendar data, personal preferences, and contextual knowledge such as maps, prices, and availability to reason about and automatically book the most suitable flights and hotels for a trip. ## Full Transcript
0:00AI agents are popping up everywhere, from smart assistants to autonomous research tools to 0:05self-driving cars. But what actually makes these things tick? In this video, we're gonna break 0:11down the anatomy of an AI agent. We'll peel back the layers, sensing, thinking and acting, to show 0:17how data from the real world turns into decisions and then is translated back out as actions. So now 0:23let's take a look at how all of these things work together So we'll start with the sensing part of 0:28the agent. The agent has to get some information in. This is basically its perception. Just like a 0:33person has eyes and ears and we perceive through those senses. Well, if we're talking about an AI 0:39agent, how does it get information in? Well, one of the ways that it gets information in is through 0:44text. It could be, ah, natural language processing. If we're talking about a chatbot, that information 0:50just gets typed in and it takes that into its processing. It could also be some sort of sensor. 0:56So, in this case, the there could be a vision sensor, a camera. It could be a microphone, 1:03something along those lines that brings in information from the outside world. It could be 1:08APIs that are or other types of events that are being triggered that are input into this system 1:15as well. So these are just a few examples of the inputs that go into the system. So then it moves 1:21over to a thinking stage. How do I process all of this. Well, it turns out in doing that I need some 1:27more context. So, one of the things I'm gonna add to this system is a knowledge base of some 1:34sort. In that knowledge base, I'm gonna store things like facts, things that are important to 1:40this system that it needs to know, rules that it needs to operate with. Ah. It could also have some 1:47other information that gives it context. So these kinds of things are gonna go into the thinking 1:53process. Other sorts of information that could be important. And by the way these could come from a 1:59database. Ah. These could come from ah a RAG source or retrieval augmented generation source. So there 2:04could be a lot of different sources for this knowledge coming into the system. Another source 2:09that we need to consider here is some sort of of policy information. So, we may have a situation 2:16where, ah, I have goals that I need to consider. What is it I want the system to be able to 2:22do? Ah. Particular objectives ah and things of that sort. I may have priorities that I need it to 2:29consider. All of those things go into the thought process as well. We don't want it to make 2:35decisions without considering the facts, the rules, the goals, objectives, the priorities, all of that 2:40kind of stuff. Then we get into the reasoning part of all of this. Here is where we're gonna start 2:45dealing with things like "if then else" kind of logic. So, here we're 2:52processing all of that information, thinking about it, doing planning. So we're looking at what do I 2:59need to do and how am I gonna go about doing it. And in these cases, I'm going to also need 3:06to decompose tasks. So, if I know that I have a big, high-level goal that I want to accomplish, well, 3:12one of the things I have to do is break that down into smaller components. So I need to do task 3:17decomposition. I'm also going to leverage things like machine learning where the system is 3:24learning through reinforcement. Or it could be learning through showing it lots of different 3:28things. And it develops a pattern. So it sees these things. We keep showing it more and more of the 3:33same type of data, more and more of certain types of objects if we're using a sensor. And then it 3:38starts to develop an idea as to what those things are. What are the characteristics that go with 3:43that? Then we use something like a large-language-model technology and leverage that again for some 3:49of these things, like the text inputs and things of that sort, chain of thought, reasoning, all of 3:54this. So here's the thinking part, the reasoning part of this system. Next, we're gonna move to 4:00the generative part, the action. Here we're gonna generate certain types of output. We could 4:06generate text as an output. We could generate speech. We could generate alerts. We could generate 4:13video, all kinds of things like that. The action might also be to read or write to a 4:20database. So, this is a possibility as well. Um. We may also execute some some 4:26level of control. So maybe we have actuators because we're wanting to operate on the real 4:32world. Maybe I have some sort of robotic system or a self-driving car where all of this. Once these 4:38decisions have been made, now I'm gonna operate on the system. So, maybe in the case of a robotic 4:44car, ah are, or a self-driving car, I've taken in information from sensors. I've considered the 4:49facts. I've considered the goals. I've run it through my reasoning logic, and then I interface 4:56through an actuator in order to affect the way that the system actually works. And another really 5:01important part of all of this is a feedback loop. So I wanna make sure that it's constantly 5:08evaluating its own own performance. We want to evaluate the outputs of the system 5:15and make sure that they're achieving, in fact, what we intended. Do they match the goals that we had 5:20in mind? The the term that we use here is reinforcement learning with human feedback. So this 5:26is basically where we give it a thumbs up or thumbs down. Sometimes the system is is getting 5:31its own feedback by trying an action and then seeing did that get us closer to the goal or did 5:37it take it further away from the goal? So we can do some of these kinds of course, corrections on 5:41its own, or we can override and create the course corrections ourself. This is the basic anatomy of 5:48an AI agent. Okay. Now let's take this anatomy that we just described in the abstract and take a real 5:55example. So let's say I want to book travel reservations for a trip that's upcoming. So what 6:00do I need to put into the system? What does it need to start with? Well, it needs to know what 6:04dates of travel, you know, when am I going and when am I coming back? It needs to know the destination. 6:10Where am I going? So, there might be other things, but we'll take those as as inputs. So that's the the 6:16sensory perception part of this. And I would probably enter that into a chatbot. Or maybe it's 6:22smart enough to read it off of my calendar and see that. That goes into the reasoning portion. But the 6:27reasoning portion, again, needs more context. So we're gonna go up here to the knowledge base. 6:32In this knowledge base I would have already stored some preferences that I have. Maybe there's 6:38certain airlines that I prefer, maybe certain hotels that I like to stay in. Um, it 6:45could also be based upon the location, ah, you know, where maybe I like a particular hotel chain, but 6:52maybe it's gonna depend on the particular city which one of those hotels I'm gonna actually 6:58have it choose. So where is the event that I'm going to be going to speak at? For instance, if I'm 7:03speaking at a conference? Well, then I'd like the hotel maybe to be close to that. But, I also 7:10run every day. So I want to have a hotel that's in a place where I can go for a decent run. So, that's 7:17another kind of preference that would be built into the system would be very personal to me. Now, 7:22some other information that wouldn't be personal to me would be, ah, things like maps. Um. So the system 7:29should know where all these different things are located. It needs to know prices for the different 7:36things that it might book. It needs to know availability for flights and hotels and things of 7:41that sort. So all of this knowledge base it could look up and that's going to be important to feed 7:46into the decision-making logic. Another thing we have to consider hey look, I'm traveling on 7:52business, so I have to follow IBM's business gone ah guidelines and do what is going with the policy. 7:58So, they're gonna have some limits or some caps on. In the particular city, you can spend this much 8:04on a hotel, but not more. Or here is a particular preferred travel partner and we want you, ah, 8:11to book with those it If that's what we're considering. So these, meh, would be policy issues that 8:18are also added in to the decision making process. Once I've gone through all of that, then the 8:24reasoning goes through and it looks at all of the things that are here, and it then figures out what 8:31is the best way to satisfy this request? And ultimately it's gonna go out to the action 8:37portion, which is going to book the reservation. And it's going to book this by going off and 8:43talking to the airline reservation system, the hotel reservation system and a number of other 8:49different things. And after it's done all those things, it's gonna give me the input. I'm 8:54gonna have the electronic ticket and the reservation and all of this kind of stuff. So this is it 8:59acting on the in the external world and accomplishing the task ultimately. Okay. So all of 9:05this is great, I think. But we need to go back and talk about the feedback loop. So, it's gonna ask 9:12me to some sort of survey, for instance, after it's all done. How did I do? Did I did I meet 9:19your needs or not? And I'm gonna give it a thumbs up or a thumbs down. So here's the 9:23reinforcement, with the reinforcement learning, with human feedback type of thing. It could also 9:30go back and evaluate some of these things on its own and say, well, I came up with this answer, but 9:34I'm gonna double check myself and see how well did I match these things, maybe even try a couple 9:39of other scenarios on my own as hypotheticals, and it could operate on that. And again, keep tuning 9:46itself, keep getting better, keep getting smarter, keep getting more personalized and more effective, 9:52and doing what is otherwise a relatively complicated task to accomplish. Now, I hope you not 9:58only understand how AI agents work, but also how powerful they can be and the amazing potential 10:04they possess to improve speed and efficiency, freeing you up to do the things you do best and 10:09leaving the gorpy details to your AI agent.