Learning Library

← Back to Library

LangChain Retrieval-Augmented Generation Demo

Key Points

  • Erica introduces a Retrieval Augmented Generation (RAG) workflow using LangChain to give large language models up‑to‑date information that they weren’t trained on.
  • She demonstrates the problem with a recent IBM‑UFC partnership announcement that an IBM Granite model couldn’t answer because its training data only goes up to 2021.
  • The RAG solution involves (1) creating a knowledge base from current IBM.com pages, (2) using a retriever to fetch relevant documents, (3) feeding those documents to the LLM, and (4) prompting the LLM with the retrieved context.
  • The tutorial shows how to set up the required Watsonx credentials, install necessary Python packages, and store the API key in a `.env` file for the notebook.
  • Finally, she builds a vector store from a dictionary of 25 IBM URLs—including the UFC article—and uses it to retrieve top results that the LLM can incorporate into its answer.

Full Transcript

# LangChain Retrieval-Augmented Generation Demo **Source:** [https://www.youtube.com/watch?v=cDn7bf84LsM](https://www.youtube.com/watch?v=cDn7bf84LsM) **Duration:** 00:07:59 ## Summary - Erica introduces a Retrieval Augmented Generation (RAG) workflow using LangChain to give large language models up‑to‑date information that they weren’t trained on. - She demonstrates the problem with a recent IBM‑UFC partnership announcement that an IBM Granite model couldn’t answer because its training data only goes up to 2021. - The RAG solution involves (1) creating a knowledge base from current IBM.com pages, (2) using a retriever to fetch relevant documents, (3) feeding those documents to the LLM, and (4) prompting the LLM with the retrieved context. - The tutorial shows how to set up the required Watsonx credentials, install necessary Python packages, and store the API key in a `.env` file for the notebook. - Finally, she builds a vector store from a dictionary of 25 IBM URLs—including the UFC article—and uses it to retrieve top results that the LLM can incorporate into its answer. ## Sections - [00:00:00](https://www.youtube.com/watch?v=cDn7bf84LsM&t=0s) **LangChain RAG Tutorial Overview** - Erica demonstrates a Python LangChain workflow that adds a knowledge base, retriever, and prompt to enable retrieval‑augmented generation for up‑to‑date answers, using an IBM‑UFC announcement as the example. - [00:03:06](https://www.youtube.com/watch?v=cDn7bf84LsM&t=186s) **Preparing Documents for Vector Store** - The speaker demonstrates mapping URLs, loading and cleaning articles with LangChain, chunking the text, embedding it using IBM's Slate model, and saving the vectors in a local Chroma database. - [00:06:12](https://www.youtube.com/watch?v=cDn7bf84LsM&t=372s) **Setting Up RAG for IBM Knowledge Base** - The speaker explains configuring a prompt template, helper function, and retrieval‑augmented generation chain to answer queries about the IBM‑UFC partnership and IBM’s watsonx.data and watsonx.ai services. ## Full Transcript
0:00Hi, my name is Erica and I'm going to show you how to use LangChain for a simple RAG example In Python. 0:07Large language models, LLMs, can be great for answering lots of questions, 0:11but sometimes the models don't have the most up to date information 0:15and can answer some questions about recent events. 0:19For example, I was reading this recent announcement about the UFC and IBM partnership 0:24on IBM.com and wanted to ask a LLM about it. 0:29But when I asked the IBM granite model to tell me about the UFC announcement from November 14th, 2024, 0:36it didn't know what I was talking about and mentioned it was trained on a limited data set up to only 2021. 0:42How do I give this LLM the most up to date information so it can answer my question. 0:48The answer is RAG, retrieval augmented generation. 0:52Let me show you how it works. 0:54Typically we have our user asking the question to the LLM, which generates a response. 1:00But as you just saw, the LLM didn't have the right information, the context, to answer my question. 1:07So we need to add something in the middle between the question and the LLM. 1:13First, we'll add a knowledge base to include the content we want the LLM to read. 1:17In this case, it'll be the most up to date content from IBM.com pages about some IBM products and announcements. 1:26Second, we'll set up a retriever to fetch the content from the knowledge base. 1:32Third, we'll set up the LLM to be fed the content. 1:37Fourth will establish a prompt with instructions to be able to ask the LLM questions. 1:43The top search results from search and retrieval will also be gathered here. 1:49Once we've completed these four steps, we can start asking our questions about the content in our knowledge base. 1:55Our query is search for in our knowledge Base Vector store. 1:58The top results are returned as contacts for the LLM. 2:02And finally, the LLM generates a response. 2:05I'll walk through all these steps again in the Jupyter Notebook, linked in the description to this video. 2:11Before we can begin, we need to fetch an API key and project ID for our notebook. 2:16You can get these credentials by following the steps in the video linked in the description below. 2:22We also have a few libraries to use for this tutorial. 2:26If you don't have these packages installed yet, you can solve this with a quick pip install, 2:34and here we can import the packages. 2:38Next, 2:38save your watsonx ID an watsonx API key in a separate .EMV file. 2:45Make sure it's in the same directory as this notebook. 2:48I have my credentials saved already, so I'll import those over 2:51from my .EMV file and save them in a dictionary called credentials. 2:56Okay, now we can get started with the workflow. 3:00First will gather the information from some IBM.com URLs to create a knowledge base as a vector store. 3:11Let's establish URL's dictionary. 3:14It's a Python dictionary that helps us map the 25 URLs from which we will be getting the content. 3:20You can see at the top here, I have the article about the UFC and IBM partnership I asked about before. 3:27Let's also set up a name for our collection, 3:30Ask IBM 2024. 3:33Next, let's load our documents using the LangChain web based loader for the list of URLs we have. 3:40Loaders load in data from a source and return a list of documents. 3:45We'll print the page content of a sample document at the end to see how it's been loaded. 3:51It can take a little while for it to finish loading, 3:54and here's a sample document based on the sample document, 3:57it looks like there's a lot of whitespace and newline characters that we can get rid of. 4:03Let's clean that up with this code. 4:08Let's see how our sample document looks now after we've cleaned it up. 4:14Great. We've removed the whitespace successfully. 4:18Before we vectorize our content. 4:20We need to split it up into smaller, more manageable pieces known as chunks. LangChain's recursive character text splitter, 4:28takes a large text and splits it based on a specified chunk size, meaning the number of characters. 4:34In our case, we're going to go with a chunk size of 512. 4:38Next, we need to instantiate an embedding model to vectorize our content. 4:42In our case, we'll use IBM's Slate model, 4:45and to finish off this step. 4:47Let's load our content into a local instance of the vector database using Chroma. 4:52We'll call it vector store. 4:53The documents in the vector store will be made up of the docs we just chunked and they'll be embedded using the IBM Slate model. 5:01For step two, we'll set up our vector store as a retriever. 5:05The retrieved information from the vector Store, the content from the URLs 5:10serves as additional context that the LLM will use to generate a response later in step four. 5:18Code wise, all we need to do is set up our vector store as retriever. 5:23For step three, we'll set up our generative LLM. 5:26The generative model will use the retrieved information from step two to produce a relevant response to our questions. 5:33First will establish which LLM we're going to use to generate the response. 5:38For this tutorial, we'll use an IBM Granite model. 5:44Next we'll set up the model parameters. 5:46The model parameters available, and what they mean can be found in the description of this video. 5:52And finally, in this step, we instantiate the LLM using watsonx. 5:56In step four we'll set up our prompt which will combine our instructions, 6:01the search results from step two, and our question to provide context to the LLM we just instantiated in step three. 6:09First, let's set up instructions for the LLM. 6:13We'll call it template because we'll also set up our prompt using a prompt template, and our instructions. 6:21Let's also set up a helper function to format our docs to differentiate between individual page content. 6:27Finally, as part of this step, we can set up a RAG chain with our search results for my retriever. 6:33Our prompt, our helper function and our LLM. 6:37Finally and step five and six, we can ask the other questions about our knowledge base. 6:43The generative model will process the augmented context along with the user's question to produce a response. 6:50First, let's ask our initial question. 6:52Tell me about the UFC announcement from November 14th, 2024. 6:58On November 14th, 2024, IBM and UFC announced a groundbreaking partnership, 7:03and it looks like the model was able to answer a question this time. 7:07Since it received the contacts from the UFC article, we fed it. 7:12Next, let's ask about watsonx.data 7:16What is watsonx.data? 7:19watsonx.data is a service offered by IBM that enables users 7:23to connect to various data sources and manage metadata for creating data products. 7:28Looks good. 7:30And finally, let's ask about watsonx.ai 7:34What does watsonx.ai do? 7:38watsonx.ai is a comprehensive AI platform that enables users to build, deploy and manage AI applications. 7:46It was also able to respond to our watsonx.ai question. 7:50Feel free to experiment with even more questions about the IBM offerings 7:54and technologies discussed in the 25 articles you loaded into the knowledge base.