Learning Library

← Back to Library

Integrating Multi-Agent RAG with VectorDB

Key Points

  • The speaker introduces a multi‑agent approach to improve retrieval‑augmented generation by categorizing queries, pulling relevant context from a VectorDB, and generating natural‑language responses.
  • A step‑by‑step demo will clone a GitHub repo, focus on the API layer, and use the existing React/TypeScript UI (built with Express and Carbon Design components) only as a visual front‑end.
  • Carbon Design React components are highlighted as a quick way for developers—especially those less experienced in front‑end work—to create polished UI elements.
  • After installing UI dependencies, the tutorial copies example environment files into `.env` for both the client and server, allowing customization such as branding the app (e.g., “Agents in Action!”).
  • With the UI setup completed, the session will transition back to the server side to demonstrate the actual multi‑agent integration.

Sections

Full Transcript

# Integrating Multi-Agent RAG with VectorDB **Source:** [https://www.youtube.com/watch?v=Yq29bZ8Hlrc](https://www.youtube.com/watch?v=Yq29bZ8Hlrc) **Duration:** 00:32:44 ## Summary - The speaker introduces a multi‑agent approach to improve retrieval‑augmented generation by categorizing queries, pulling relevant context from a VectorDB, and generating natural‑language responses. - A step‑by‑step demo will clone a GitHub repo, focus on the API layer, and use the existing React/TypeScript UI (built with Express and Carbon Design components) only as a visual front‑end. - Carbon Design React components are highlighted as a quick way for developers—especially those less experienced in front‑end work—to create polished UI elements. - After installing UI dependencies, the tutorial copies example environment files into `.env` for both the client and server, allowing customization such as branding the app (e.g., “Agents in Action!”). - With the UI setup completed, the session will transition back to the server side to demonstrate the actual multi‑agent integration. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Yq29bZ8Hlrc&t=0s) **Multi‑Agent RAG Integration Demo** - A step‑by‑step walkthrough demonstrating how to use multiple AI agents for query categorization, vector‑DB context retrieval, and response generation—including cloning the repository, setting up the UI directory, and installing dependencies. - [00:03:05](https://www.youtube.com/watch?v=Yq29bZ8Hlrc&t=185s) **Setting Up Python API Environment** - The speaker walks through creating a virtual environment, installing dependencies (including CrewAI and watsonx.ai), and configuring the .env file with Watsonx.ai connection details obtained from IBM Cloud. - [00:06:13](https://www.youtube.com/watch?v=Yq29bZ8Hlrc&t=373s) **Launching FastAPI Chatbot with VectorDB** - The speaker walks through starting the FastAPI UI, then using provided scripts to create a ChromaDB vector store from documentation files, enabling backend categorization of queries and generation of chatbot responses. - [00:09:20](https://www.youtube.com/watch?v=Yq29bZ8Hlrc&t=560s) **Integrating Watsonx.ai LLM with CrewAI** - The speaker demonstrates how to configure a Watsonx.ai language model—specifying model, temperature, token limit, and connection credentials—and embed it into the CrewAI agentic framework for query categorization. - [00:12:34](https://www.youtube.com/watch?v=Yq29bZ8Hlrc&t=754s) **Designing Multi-Agent Prompt Workflow** - The speaker outlines how they configure an LLM as a collection selector with a specific backstory, control verbosity, toggle delegation, and limit iterations to manage a sequential three‑agent system. - [00:15:36](https://www.youtube.com/watch?v=Yq29bZ8Hlrc&t=936s) **Defining JSON Output for Categorization Agent** - The speaker outlines how to configure a categorization agent to return a single-field JSON (technical, billing, or account) via a Pydantic model and integrate it into a broader crew workflow. - [00:18:41](https://www.youtube.com/watch?v=Yq29bZ8Hlrc&t=1121s) **Building and Extending Retrieval Pipeline** - The speaker demonstrates testing a categorization agent, commits the changes, and then outlines adding a new retriever LLM and agent to fetch data from a VectorDB, noting token requirements and tool usage. - [00:21:57](https://www.youtube.com/watch?v=Yq29bZ8Hlrc&t=1317s) **Adding Retriever Agent to Workflow** - The speaker explains how they integrate a retriever agent, supply it with categorization context, adjust task outputs, and test the updated sequential crew. - [00:25:00](https://www.youtube.com/watch?v=Yq29bZ8Hlrc&t=1500s) **Creating a Generation Agent with LLMs** - The speaker outlines adding a final generation agent that employs a tool to interpolate query and context into a prompt for user responses, highlighting the flexible pattern of assigning different LLM models (e.g., Watsonx.ai, Mistral, Llama) to each agent. - [00:28:10](https://www.youtube.com/watch?v=Yq29bZ8Hlrc&t=1690s) **Finalizing RAG Agent Workflow** - The speaker demonstrates adding a final agent, constructing a JSON response with a category field, retrieving context from a vector DB via a retriever, and confirming the successful end‑to‑end RAG output sent back to the UI. - [00:31:11](https://www.youtube.com/watch?v=Yq29bZ8Hlrc&t=1871s) **Future Enhancements for Agentic RAG Chatbot** - The speaker outlines next steps such as adding agents to redirect unknown queries to web searches and to format replies in HTML, while recapping the current multi‑agent RAG pipeline and encouraging further experimentation with the CrewAI framework. ## Full Transcript
0:00Have you ever had a tremendous amount of data in your VectorDB, 0:03and you're using that for retrieval-augmented generation? 0:06But the context you're getting to give to your LLM to produce those great results is kind of lacking 0:12because it's pulling in different data that shouldn't really be pulled in with your query? 0:17Well, today I'm gonna show you how to seamlessly integrate 0:20multiple AI agents into your application to combat this kind of problem. 0:24We will walk through a practical example that covers query categorization, context retrieval from a VectorDB, 0:30and natural language response generation, all using this multi-agent approach. 0:36This session is designed to give you a clear step-by-step guide on working with agents in your projects. 0:42Let's dive in and explore how you can leverage these tools to build smarter applications. 0:47So in the description of the video, you should have a link to this repo. 0:51The first thing we're gonna do is just clone down the repo to our local machine. 0:58Okay. 0:59And let's go into the repo and then into the UI directory. 1:04So if you look at the structure of the application, we're gonna have an API, which is what we're gonna be working on today. 1:10And then we have a UI, which we're not gonna work on, but that's what's gonna render what we're doing to your browser. 1:16And so we're gonna go into the UI, and we're gonna install the dependencies. 1:19So the first thing is just install the root dependencies, and once that's completed, we're going to run a setup script. 1:33So, the UI, even though we're not working on it, 1:35the UI is made with React TypeScript, 1:38and it has an Express TypeScript server, and we're using something called Carbon components. 1:43And I just wanna highlight Carbon components for a second. 1:46So if you search Carbon Design React, it's gonna bring you to the docs page. 1:50And this is where I get all like the components to put into the UI. 1:54It's super easy, especially if someone who's not a particularly adept front-end developer. 2:00It just makes it super, super easy and they look good. 2:03And you know, they have the code, everything you need. 2:05So my suggestion is, even if we're not working on the UI today, 2:09go look at Carbon Design, look at what we can do and maybe change the application when you're done with it. 2:14Alright, so we're gonna wait for the dependencies to install and we'll be right back. 2:19Okay, the dependencies have been installed. 2:21One last thing to do within the UI is we're gonna copy the client's 2:25env example into, back into the client, and we're going to create an env. 2:31And we're doing the same thing for the server. 2:32Copy that env example and place it right back as .env in the server. 2:38Something we also could do, if we want to go into client.env, 2:41we've added a way for you to brand the application after/if you want to make something your own. 2:47So we have a branding and an application name. 2:48So for this one, we're going to say Agents in Action! 2:57Okay. So now we're done with the dependencies, and we're done totally with the UI today. 3:01So let's go back to the root and then let's head over to the API. 3:05So the API is written in Python. 3:08So let's create a virtual environment. 3:12We'll name that aiagentic. 3:15And once that's done, let's activate. 3:25Let's activate the virtual environment. 3:30And now we're gonna install all the dependencies. 3:34So this is gonna take a little while. 3:36We're installing CrewAI. 3:38We're installing watsonx.ai, a ton of dependencies. 3:41And so, once this is completed, we'll continue along with the tutorial. 3:45Okay, so now that the dependencies have been installed, 3:49we have to just copy the .env.example and paste it into .env in the API. 3:55Let's take a look at what's in that env. 3:58Because we're using watsonx.ai, we need to have the connection strings for, to connect to watsonx.ai. 4:03So we're gonna head over to Cloud. 4:05And if you go into your resource list, just open up your Watson Studio and go into IBM watsonx. 4:14And it's just gonna log us in. 4:17And we're gonna head over to Prompt Lab. 4:18Now, I'm sure there's a better way of getting this information than the way I do it. 4:23But the way do it is I just go to Promp Lab. 4:25At the top right, we have a view code, and it shows you a cURL command. 4:28And it has all the, most of the stuff that we need in order to make that configuration with our API. 4:33So grab from the cURl the base URL and paste it here in the WATSON_URL. 4:41And we go back, and we grab the project ID. 4:46Just copy and paste that. 4:53And then finally, let's go back to cloud.ibm. 4:57We're gonna go at the very top. 4:58You're gonna have a Manage and you're gonna wanna go to the Access (IAM). 5:03And when we get there, on the left-hand side, you're going to see something called API keys. 5:07Let's create a new one. 5:11Call it AGENTIC and create. 5:14So let's just copy that and put it right here. 5:20And that's it. Now our API is set up. 5:23So the next thing we wanna do is check out into a new branch. 5:29Alright, so let's check out our first branch. 5:33It's gonna be a one-step. 5:37And what we wanna do is we wanna start up all our services. 5:41We have three services. 5:42Remember we have the FastAPI, the React UI and the Express Server. 5:46So let's first start up the. 5:50Wait, we have to go into the API directory first, and then we could run our uvicorn command. 5:55We're just gonna run the uvcorn server up, reload, and we're gonna start our FastAPI. 6:01Also, let's have two more windows, because we're gonna go back 6:06to the UI, and we're gonna start up the client, and then, we are gonna start up the server. 6:20And all these commands are in the repo, so you can just copy and paste them. 6:24So we're just waiting a second for Uvicorn, for the FastAPI to get all ready, 6:29and we'll head over to the browser, and we could already see what the UI's gonna look like. 6:36Beautiful. It's a chatbot. 6:37You have a chat window and a couple buttons, but what it's gonna do on the backend is gonna be pretty cool, I think. 6:43So if you go to your API directory, there's gonna be a couple of folders that 6:47you're gonna be interested in and a questions.txt that I've been using. 6:53So let me copy and paste this into our chat window. 6:57And what we're gonna wanna do is when we hit send, 7:00we're gonna want the backend to categorize that query, 7:03grab the correct data from the correct collection of VectorDB and ChromaDB. 7:10And then we're gonna want to pass that to a customized prompt and return a nice response. 7:16Currently, it's just gonna say this will be generated by our multi-agent process 7:19and the category is something cool, but it will be something cool once we set it up. 7:24So let's go and run the first script that we have. 7:30So if you look in API scripts, and you look at the process document script. 7:34This is how we're going to create our ChromaDB uh VectorDB, right? 7:41We have a directory called docs, and in that, we have three text files, one called accounting, billing and technical. 7:49And what I'm trying to show here is that this is, imagine you have a tremendous amount documentation. 7:54And some of those, when we query them in a VectorDB, 7:58the cosine similarity distance is gonna be pretty close for stuff that might not be relevant. 8:02So we isolate topics, right? 8:04So we have an account, we have a technical, we have the billing, and I'm just trying to recreate that locally. 8:09Something very simple. 8:10So if you look at the script, all it's doing is it's saying okay, what's the file? 8:13It's gonna loop through all the files and docs. 8:15What's the file name? 8:16Create a new collection with that file name and insert the embeddings for that file into that collection. 8:23So let's run that script. 8:29And you'll see. 8:29First, it's gonna say okay, is there any documents? 8:32No existing collection found. 8:33So it's going to create three new ones. 8:35And we're going to have account, billing and technical. 8:38Before we could do any of that, we have to first categorize the query. 8:43So this is where we're gonna create our very first agent. 8:45It's going to be the categorization agent. 8:52So the route we're looking at that we're gonna change is called the agentic route. 8:56And here, you can see in the docstring I have what each of the agents are gonna do. 9:01The first one is gonna be the query categorization, then we're gonna have 9:03the context retrieval, and then we're gonna have the response generation. 9:07And these are all gonna be agents. 9:09So let's bring in our agent framework, which is CrewAI. 9:13So we're to say, from crewai, import the first class we're gonna bring in is agent. 9:20Then we're gonna bring in the task. 9:21We're gonna bring in the crew. 9:22We're gonna bring in a process. 9:23And we're gonna bring in LLM. 9:25So let's go back. 9:27And you can see in the first docstring. 9:30It's an LLM-powered agent. 9:32So this is where we start connecting watsonx.ai to the CrewAI agentic framework. 9:37So let's create our first LLM. 9:44And you look at the docs. 9:45The only thing we're really concerned with from this is just the model. 9:47We're gonna use a watsonx.ai model, temperature, max tokens, and then all the connection strings. 9:54So if we look in the server.py, we have a list of available models that you could use from watsonx.ai. 10:01So I'm gonna use the Granite 3.8 billion. 10:06So let's bring that in. 10:09And we just have to append watsonx to the front of it. 10:14Then we're gonna set the temperature. 10:15Now this is from trial and error, but 0.7 works well. 10:19And the max tokens is 50. 10:21Because again, all this is doing is just categorizing a query, right? 10:24We're not doing any massive return or anything like that. 10:28So the next thing we have to do is we're gonna bring in the connection string for Watson. 10:36So we're gonna need the API key, the project ID and the URL. 10:39So let's say URL is going to equal OS And you get... 10:45the key. 10:46Let's just paste that in. 10:48And then we're going to have the API key, where you get that from our env. 10:59And then finally, we're gonna get the project ID. 11:04We're just gonna, again, just bring it in. 11:11So let's bring it into the the CrewAI LLM class. 11:15So we're gonna have the base URL is gonna be the URL. 11:18The API key is gonna the API key, and the project ID is gonna to be the project ID. 11:27So now we have to create our first agent. 11:30Let's do it. 11:31We're gonna name him categorization agent, a very clever name. 11:38And if we look at the docs again, we can see exactly what we're gonna be doing here. 11:43So the top three are particularly cool, right, to me. 11:47You have a role, a goal, and a backstory. 11:50If you go to the CrewAI docs, you have really, you know, they are really good about 11:56explaining exactly what all those attributes do, but I just wanna read them to you. 12:00So the role is defining the agent's function and its expertise within the crew. 12:05Remember, it's a crew of agents. 12:07The goal is we're gonna, it gives you the clear defined goal of what it's going to do. 12:11And then the backstory is great. 12:12It's just provides context and personality to the agent, which I find very, very cool. 12:18So let's start with the, uh, let's start with the role. 12:25And so for the categorization agent, the role is gonna, the one I've come up with is Collection Selector. 12:35Now, if you've worked with LLMs, you know a lot of these, 12:39and this probably looks like prompt engineering to you because it kind of is. 12:43It's just trial and error. 12:44This is what worked out for me. 12:45So I gave it the role of a collection selecter because it's selecting the collection. 12:49The goal is to analyze. 12:53It's going to analyze the user queries and determine the most relevant ChromaDB collection. 13:05And then, we're gonna give it a backstory. 13:09He is an expert in query classification. 13:15And he routes questions to the correct domain. 13:22Alright. Finally, we're gonna do, add a couple of the other things that we need. 13:24Verbose, I'm going to set to true because we want to see what it's up to in the logs. 13:30Allow delegation. 13:31So this is interesting. 13:32Remember, we're gonna have multiple agents, and they could all have different goals and different expertises. 13:39And it can make the decision on what it wants to do based on that, right? 13:42Like, okay, this is not for me, let me send it to another agent. 13:44It's gonna allow it to delegate. 13:45But in our case, we don't, we're just going sequentially, where we have three agents. 13:49We need them to do exactly what we want them to. 13:52So we turn delegation off. 13:55And in line with that, there's also something called max iterations. 13:59So it defaults to 20. 14:01But in our case, again, like if something is not working, 14:03because these are pretty simple tasks, if something's not working, it's just gonna try to, try it over and over and again. 14:08We just have to fix the code, at least in my experience of what's happening. 14:12And finally, we have to give it its brain, right? 14:15We're gonna give it the categorization alarm. 14:17This is its reasoning capability. 14:18This is how it's gonna actually do what it needs to do. 14:21And that's the categorization alarm that's using Granite. 14:23So now we have an agent who has a brain and has a role and has a backstory and has a whole life story. 14:29We have to it a task. 14:31We are going to ask the agent to do something. 14:33So let's create our task. 14:42And again, let's look at the docs. 14:43So the things we're gonna be concerned with, obviously, is agent. 14:45We have to assign this task to our categorization agent. 14:49Description, which is gonna be really just a prompt. 14:53And then output JSON, which is really important for what we're going to do, 14:56because we're gonna send this first agent's response directly back to the UI. 15:01So it has to be formatted in a particular way. 15:04So I'm just gonna copy and paste the description, because it's just a, it's a prompt, right? 15:08Like if anyone's used prompts before, this took me a while to get correct. 15:15But you can see exactly what I mean, right? 15:17This is a prompt. 15:18We're saying, look at the query and determine the best category. 15:21You must only return one word. 15:23Because again, we're gonna be using this later down the line as the classification agent. 15:27And then we give it category definitions, and we're really kind of broad with it because we want to give agency to the agent. 15:33So we're just saying, okay, this is what a technical query could look like. 15:36This is what billing... 15:36We're giving that. 15:37We're giving the agent agency here. 15:39And then finally, I just really want to hammer home: please, just only one word from this list. 15:45And then, we also have a expected output. 15:49And this is important because we need something explicit. 15:53So we want a JSON object with category field, and it has to either be technical, billing or account. 15:58The agent we're assigning to it obviously is the categorization agent. 16:02And finally, the last thing that I mentioned was the output JSON. 16:06And I find this really nice. 16:08So the output JSON takes in a Pydantic model. 16:11So let me copy and paste the Pydantic model I have. 16:14And I'll show you what I did. 16:20Paste it in here. 16:22So we have a category response. 16:24I'm expecting a JSON object with a category field, and the value is going to be either technical, billing or account. 16:30Now I added this description because I have a feeling that the 16:33agent is actually looking at the description of these models before it's responding. 16:37I not don't don't quote me on it but that's what I think it's doing. 16:41So I added it, and it worked well. 16:44So from all I can tell it is working the way it's supposed to be working. 16:47Alright, so now we have a agent who is powered by our LLM and has a task to follow. 16:53So let's create the first crew. 16:59You look at the docs here, we have tasks, we have agents, 17:02we're going have, we're gonna use the process, and we're gonna have verbostics. 17:06We want to have some responses. 17:09So first thing to first is let's add the agent. 17:17And right now, we only have one. 17:20Then we have to add the tasks. 17:26And that's gonna be the categorization task. 17:28Remember, this is gonna be a crew. 17:29There's gonna a couple of agents here. 17:31And then we're gonna have verbose. 17:32We're just gonna set it to true again because we wanna see what it's doing. 17:35And then finally, we have our process. 17:38And the process is going to be sequential. 17:41Because if you look at the docstring; we're going just step by step. 17:44So now that we have the crew, let's have the crew kick off. 17:53And we're gonna call the kickoff method from the crew. 17:56And if you remember from the category response, we're expecting it to be a JSON object with category as a field. 18:03So what we're going to do is we're gonna, instead of sending back something cool to the UI, 18:11we're going to grab the category result and we're going to try to grab that category from the response. 18:20Make sure nothing broke. 18:29Looks good. We have the category result. 18:30Let's test it out. 18:33So let's copy and paste what we already sent. 18:38And hopefully it returns back category something, category technical. 18:41Perfect. 18:42So if we look at the logs, we can see exactly what it's doing, right? 18:47We are using the collection selector, that first agent. 18:51You can see the test that we're giving it. 18:52We pass in that user query, the one that we sent from the UI. 18:55And then we get the final answers as exactly the structure that we were looking for. 18:58So the UI could ingest it and render it nicely. 19:02Alright, so now that we have the basic categorization and agent in place, let's move on and enhance our pipeline. 19:08So let's just commit our changes, and let's check out the second-step branch. 19:16Alright, perfect. Nothing broke. Great 19:20Okay, so the next step here, if we go to the docstring, is now to retrieve that data from that VectorDB, right? 19:30Let's make it. 19:31The only, so we're gonna do the same process, we're gonna copy and paste the categorization LLM. 19:36We're gonna create a new LLM, and this is gonna be the retriever LLM. 19:44It's going to necessitate more tokens. 19:46Like I said, it's 1,000, but everything else is going to stay the same. 19:49And then, we're just going to grab two more: an agent and another task. 19:55So this is the retriever agent and the retriever task. 19:58I'm going to copy and paste this from our notes, and I'll explain exactly what they're doing. 20:04There is going to be a significant difference here, and you'll see the error right away 20:08is that it's using a tool, and I'll explain what we're doing there. 20:11So the retriever agent has a job, right? 20:14It's going to take that category that it's receiving from the categorization agent, 20:19and it's going to pass it to a function that is going to query our VectorDB. 20:26And so that function is going to be that tool. 20:29So let's create our first tool. 20:33We're going to name it the query collection tool. 20:38Let's define it. 20:45It's going to take. 20:47What does it take? 20:49It's only taking the category, and it's gonna take the, it's going take the query to embed. 20:55So, query, string, and it's gonna return a dictionary. 21:03Perfect. Instead of docstring, this is going to be the tool to query ChromaDB 21:09based on category and return relevant documents. 21:19Now if you've ever worked with RAG, if you've ever worked with 21:22VectorDB, the functionality of this is gonna be very familiar, right? 21:26So let me just copy and paste what that actual tool is gonna do. 21:31We're using the watsonx.ai embeddings. 21:33Don't worry, like if you don't have that, you could use your own embeddings model if you have it locally. 21:37My computer is not capable of it at the moment. 21:41The interesting thing here, though, is this part. 21:44We're grabbing the category that was returned by the categorization task, 21:48and we're using that to query the VectorDB, which is 21:52fascinating because you're just saying this is what you do and the LLM is doing it, the agents are doing it. 21:58So that is very, very cool to me. 21:59So now that we have that, we have that tool, you can see what the 22:03retriever agent is doing, and we give it the task, you know, we're passing in that query from the route. 22:09We have an expected output. 22:11We're not worried too much about this because we're not ever gonna send back the context 22:14to the UI, so we're not really enforcing that output JSON. 22:19But the only other thing I want to mention here is that we had to add a context. 22:24And context is: We're giving access to that categorization tasks, like what it's output was. 22:31So it knew that's how we're getting that category and that's how we're telling the agent, 22:34look at this category, you have the query, pass it, use this function, and then call it and return back the context. 22:41So for us now, all we're gonna do is we're going to add the new agent to our crew. 22:48Welcome. And we're gonna add the new task. 22:54Process is still gonna be sequential. 22:57This time, let's just remove the category, like actually grabbing it, because we're not gonna be returning that anymore. 23:03So we'll just say, bye for now. 23:07But what we are gonna do is we're gonna print out that category result. 23:19Okay. 23:22Make sure everything comes up. 23:23Good. 23:24We're gonna print out that category result and we're gonna see exactly what happens when we 23:27send over that request, that query from here. 23:32Hopefully, we'll watch the new agent do exactly what we want it to do. 23:39Okay. Let's get there. 23:39Okay, so we're okay. 23:40It already has the correct category. 23:41So now that it's gonna take the collection... 23:44Look at that. 23:45We got the result from the RAG. 23:47It used that category and passed it to our ChromaDB collection. 23:50So we got that collection and then we queried it. 23:52And now it returns back all the context for that query by basically by itself. 23:57We just told it. 23:58Yeah, and so, we have, you could see almost exactly what we're gonna send to the final agent, right? 24:03We're gonna send it the category because we want to return it to a UI. 24:07We're sending it the query. 24:08And now we have the context from our VectorDB to augment, retrieval, augment and generate our response. 24:19I know I think that is particularly fascinating. 24:23So that's just a way that you could use tools with agents. 24:26And that was really just our retriever agent, right? 24:28Like we're able to use that function and tool in this case means like we're using a function. 24:34We're giving an agent tools to use a function and that is very, very cool to me. 24:38So we're done with the retriever agent and we're gonna move on to the generation agent. 24:44And this is the final step of the application. 24:46So let's just commit our changes. 24:51And let's check out the final step. 24:58Let's go back to our API. 25:00And if we look back at our docstring, we know what the final step is. 25:05We're going to.... We're going to create an agent that creates a nice response for the user. 25:13Basically, everything that we just did, we're gonna do one more time. 25:16And I really like this, this uh, this pattern, right? 25:21Like I've creating their own LLMs for each of the agents. 25:24I find that to be very, very nice, because we could set different, 25:29for us, we didn't really set any, the only thing we're changing is the MAC tokens. 25:32Obviously, we want the response to have more leeway. 25:34But other than that, we're just, like, we could make it drastically different. 25:38Each LLM could be, that we power, can have a different model if we're using watsonx.ai. 25:42We could use Mistral. 25:43We can use Llama. 25:45We could use whatever we want. 25:46So let's add the final, the final, to the task and the final agent, which is gonna be our generation agent. 25:56And once again, we're gonna be using a tool, and I'll explain why in a second. 26:01So again, we gave a roll, we give it a backstory. 26:05We have a... 26:09an LLM. 26:10Let me just make sure I named it correctly. 26:16Oh, yeah, there's a generation one, not the response. 26:18Let me just update that. 26:21Okay. 26:24But we have, we're missing one last tool. Okay. 26:27And this tool, what I'm gonna show you is how I found the prompt for this. 26:32So this tool is going to interpolate that query and that context into a nice prompt. 26:38And where I got the prompt is if you go to your projects, you can create this accelerator, 26:43just like look up watsonx.ai RAG, and it will give you this accelerator that you could just create. 26:48And within there, they have prompt templates written by the 26:50people who train the models, you know, or work with it a tremendous amount. 26:55So this prompt template is just, I'm going to take this, because they wrote it 26:59better than I, and I'm really not a particularly good prompt engineer, to be totally honest. 27:03So I just copy and paste this, and I wanna then interpolate 27:07the context that we received from the ChromaDB into the context and the question from the query, right? 27:12So what we're gonna do is we're gonna create another tool, 27:15and I'm just gonna copy and paste the tool and the process, and uh the prompt. 27:22And we give access to the generation agent. 27:25So this generation response tool, the generate response tool, you can see exactly what I'm doing. 27:28It's grabbing the context. 27:29It's grabbing the query and it's sending it to this prompt. 27:33And finally, there's one last thing we have to do, which is create a Pydantic model for the output. 27:40Because now we're sending back the entire thing to the UI. 27:42I really want to enforce that it's just gonna be a JSON. 27:45I really want that category, and I really want that, I believe I call it, response. 27:50I'll figure it out. 27:50Let me look at what actually I called it. 27:52But I need both of those to be there in order for it not to, you know, blow up on response. 27:57So let me copy this model. 27:59Let's add it to the top over here. 28:07Okay, and this is going to be... 28:28So let me just copy this model and paste it right here. 28:32It's gonna be the final response. 28:33This is gonna be the JSON object that we're looking for that has a category field and has a response. 28:39And we're gonna send this back straight through directly to our UI. 28:46So that's why we have this in the generation test. 28:50That's why have, we're trying to say, okay, this is what we want, 28:53this what we want it to look like, and this what you need to return. 28:57So let's add our final agent to the crew. 29:05Let's give him his final task. 29:11Okay. We have the crew kickoff. 29:13Let's just call this crew result now, because we no longer need to send back that hard-coded response. 29:21Get rid of this if you're not sending this. 29:29Okay. Make sure nothing broke. 29:32Perfect. And let's see if we get a nice response. 29:45Okay. Alright. 29:45It grabbed the correct category, okay. 29:48It sent that category to the retriever who she returned back all of the context from the RAG, from the VectorDB. 29:56Now she's going to send it to the final agent who's going interpolate that into 30:01that prompt we cribbed notes from, from Watson Studio. 30:06Let me see it. 30:10Perfect, okay, yeah, so you see it, the response has, okay, we have, we have everything set up. 30:14It has the context, and look at that. 30:19Huh? That worked! 30:21I'm not surprised. 30:22It worked before. 30:23I built this. 30:23But still, it's always kind of surprising. 30:26It's an amazing, it's an amazing technology. 30:28So you see, we have the, this RAG response. 30:32Let's actually double-check to make sure everything looks right. 30:34So we have, it's referencing error 01. 30:38So let's look at our docs. 30:39And let's make sure that we have the correct stuff. 30:44Error one. 30:44Session expired. 30:45Clear your browser. 30:46And let's see what that says. 30:48Perfect. 30:49Yeah, so it worked exactly the way we wanted to. 30:52Something I really like is it's able to give me back like different steps, like responses in the pipeline. 30:58I'm able to pass them along during the agent. 31:00So I'm able to categorize the query. 31:03I'm able to show a really, really nice message and a good, accurate response. 31:09And it's all done with these agents. 31:10I think it's very cool. 31:12Umm, yeah. 31:13And so obviously we could, we can enhance this, we could refactor it, 31:16we could change the parameters that we're using for the LLM to make it do different things. 31:20We could use a totally different LLM, totally different models to have whatever we want. 31:25What I really want to do, the two new agents that I really want to make, 31:28I'm probably gonna do it later, is to route queries to the web if it's not part of the ChromaDB collections, 31:35if it's able to categorize and say, okay, this is out of the blue. 31:38And also I wanna really, I wanna, I wanna format that response when we get it back to the 31:44UI, to maybe maybe format it within HTML and have another agent do that, right? 31:49Look at this and put this into a nice HTML package and post it on as a response. 31:55Awesome. 31:56We've built a pretty sophisticated multi-agent pipeline here. 32:00So let's just recap. 32:01We built the backend to an agentic RAG chatbot that is able to identify the queries category, 32:06target the correct ChromaDB collection and interpolate the query in the context 32:11into a custom prompt and generate a natural language response. 32:16So with this application and this process, we would love for you to explore additional use cases, 32:20customize the UI and experiment with the CrewAI framework, and build something really cool. 32:26Maybe add a route that makes a web search if the query is just totally out of bounds. 32:31Maybe create an agent whose only job it is is to format the response in a particular way. 32:35We would love to see anything you do with it. 32:38Dive into the code, have fun, build something cool, refactor it, make it better, just be creative.