Agentic Retrieval Augmented Generation with LangChain
Key Points
- The tutorial introduces **agentic Retrieval Augmented Generation (RAG)**, using IBM’s Granite 3.08b‑Instruct model as the reasoning engine, but any LLM can be swapped in.
- After installing required packages and loading API credentials from a .env file, a **prompt template** is created to let the LLM receive multiple questions and generate responses.
- A basic query (“What sport is played at the US Open?”) is answered correctly, while a newer query (“Where was the 2024 U.S. Open?”) fails because the model’s training data predates the event, highlighting the need for external knowledge sources.
- A **knowledge base** is built by listing relevant URLs, loading their content with LangChain’s web loader, chunk‑splitting the text, embedding it via IBM’s Slate model through watsonx.ai, and storing the vectors in an open‑source Chroma DB vector store with a retriever tool.
- A **structured chat prompt** combines system, human, and tool sections, instructing the agent to display its thought process, the tools invoked (e.g., the RAG retriever), and the final answer, enabling multi‑tool reasoning for complex queries.
Sections
- Agentic Retrieval Augmented Generation Tutorial - A walkthrough of setting up an agentic RAG pipeline—installing packages, configuring credentials, using IBM's Granite model as a reasoning engine, testing its limits, and building a web‑source knowledge base with LangChain to enable up‑to‑date answers.
- Building a RAG‑Enabled Agent Prompt - A step‑by‑step walkthrough of creating a structured chat prompt (system, human, and tool sections), adding LangChain conversation memory, configuring the agent executor, and demonstrating how the agent uses a RAG tool to answer queries.
- Agent Memory Updates & Tool Discretion - The agent successfully used a rogue tool to fetch and store information, but intelligently avoided unnecessary tool calls when it already possessed the needed knowledge, such as knowing France's capital.
Full Transcript
# Agentic Retrieval Augmented Generation with LangChain **Source:** [https://www.youtube.com/watch?v=Y1PaM3edYoI](https://www.youtube.com/watch?v=Y1PaM3edYoI) **Duration:** 00:06:42 ## Summary - The tutorial introduces **agentic Retrieval Augmented Generation (RAG)**, using IBM’s Granite 3.08b‑Instruct model as the reasoning engine, but any LLM can be swapped in. - After installing required packages and loading API credentials from a .env file, a **prompt template** is created to let the LLM receive multiple questions and generate responses. - A basic query (“What sport is played at the US Open?”) is answered correctly, while a newer query (“Where was the 2024 U.S. Open?”) fails because the model’s training data predates the event, highlighting the need for external knowledge sources. - A **knowledge base** is built by listing relevant URLs, loading their content with LangChain’s web loader, chunk‑splitting the text, embedding it via IBM’s Slate model through watsonx.ai, and storing the vectors in an open‑source Chroma DB vector store with a retriever tool. - A **structured chat prompt** combines system, human, and tool sections, instructing the agent to display its thought process, the tools invoked (e.g., the RAG retriever), and the final answer, enabling multi‑tool reasoning for complex queries. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Y1PaM3edYoI&t=0s) **Agentic Retrieval Augmented Generation Tutorial** - A walkthrough of setting up an agentic RAG pipeline—installing packages, configuring credentials, using IBM's Granite model as a reasoning engine, testing its limits, and building a web‑source knowledge base with LangChain to enable up‑to‑date answers. - [00:03:06](https://www.youtube.com/watch?v=Y1PaM3edYoI&t=186s) **Building a RAG‑Enabled Agent Prompt** - A step‑by‑step walkthrough of creating a structured chat prompt (system, human, and tool sections), adding LangChain conversation memory, configuring the agent executor, and demonstrating how the agent uses a RAG tool to answer queries. - [00:06:16](https://www.youtube.com/watch?v=Y1PaM3edYoI&t=376s) **Agent Memory Updates & Tool Discretion** - The agent successfully used a rogue tool to fetch and store information, but intelligently avoided unnecessary tool calls when it already possessed the needed knowledge, such as knowing France's capital. ## Full Transcript
Hi, I'm Anna and this is how to perform agentic retrieval augmented generation.
In other words, agentic RAG.
We'll need a few packages for this tutorial.
Make sure to install the following libraries.
Next, we'll import the following packages.
Then we input our API key and project ID.
We've set up our credentials using a .env file.
For this tutorial we're using IBM's granite 3.08b instruct model, but you're free to use any AI model of your choice.
The purpose of these models is to serve as the reasoning engine that decides
which actions to take.
We'll set up a prompt template
to ask multiple questions,
and now we can set up a chain with our prompt and our alarm.
This allows the LLM to produce a response.
Let's test our agent response to a basic question
like what sport is played at the US open?
Our agent responded correctly.
Let's try something a little harder.
Like where was the 2024 U.S. Open?
The LLM is unable to answer this question.
The training data used for this model is from before the 2024 U.S.
Open happened, and without the appropriate tools,
the agent doesn't have access to this information.
The first step in creating our knowledge base is listing the URLs.
We will be getting content from.
Here are a list of URLs summarizing IBM's involvement in the 2024 U.S. Open. Let's put them in a list called URLs.
Next, we’ll load the URLs as documents using LangChain's web based loader.
We'll also print a sample document to see how it loaded.
Looks good.
In order to split the data in these documents into chunks that can
then be processed by the LLM, we can use a text splitter.
The embedding model that we are using is
an IBM slate model through the watsonx.ai embedding service.
Let's initialize it.
In order to store or embedded documents.
We will use Chroma DB, an open source vector store
to access information in the vector store.
We must set up a retriever.
Let's define the get IBM US Open context function and tool.
Our agent will be using the tools description helps the agent know when to call it.
This tool can be used for routing questions to our vector store.
If they're related to IBM's involvement in the 2024 US Open.
Next, let's set up a new prompt template to ask multiple questions.
This template is more complex.
It's known as a structured chat prompt and can be used
for creating agents to have multiple tools available.
Our structured chat prompt will be made up of a system prompt,
a human prompt, and our RAG tool.
First, we'll set up the system prompt.
This prompt tells the agent to print its thought process,
the tools that were used, and the final output.
In the following code, we're establishing the human prompt.
This prompt tells the agent to display the user input, followed by
the intermediate steps taken by the agent as part of the agent scratchpad.
Next, we'll establish the order of our newly
defined prompts in the prompt template.
Now let's finalize our prompt template by adding the tool names, descriptions,
and arguments using a partial prompt template.
An important feature of AI agents is their memory.
Agents are able to store past conversations
and past findings in their memory to improve the accuracy
and relevance of their responses going forward.
In our case, we'll be using LangChain conversation buffer memory
as a means of memory storage.
And now we can set up a chain with our agents
scratchpad memory prompt and the LLM.
The agent executor class is used to execute the agent.
We're now able to ask the agent questions.
Remember how the agent was previously unable
to provide us with the information related to our queries?
Now that the agent has its RAG tool available to use,
let's try asking the same questions again.
Let's start with where was the 2020 for US Open?
Great.
The agent used its available RAG tool to return the location of the 2024 U.S. Open.
We even get to see the exact document that the agent is retrieving its information from.
Now, let's try something harder.
This time, our query will be about IBM's involvement in the 2024 US Open.
Again, the agent was able to successfully retrieve
the relevant information related to our question.
Additionally, the agent is successfully updating its knowledge base
as it learns new information and experiences new interactions
as seen by the history output.
Now, let's test if the agent can determine when to calling
isn't necessary to answer the user query.
We can test this by asking the wrong
agent a question that is not about the US Open.
Like what is the capital of France?
As seen in the agent executor chain,
the agent recognized that it had the information in its knowledge base.
To answer this question without using any of its tools.
And that's it.
In this tutorial, we created a RAG agent using LangChain in Python with watsonx.ai.
The LLM we worked with was an IBM granite 3.08 the Instruct model.
The AI agent was successfully able to retrieve relevant
information via a rogue tool, update each memory
with each interaction and output responses.
It's also important to note the agent's ability to discern
whether tool calling is appropriate for each specific task.
For instance, when the agent already had the relevant
information needed to answer a question about the capital of France,
it didn't use any tool calling for question answering.