Build a Retrieval‑Augmented Chat App
Key Points
- The video demonstrates building a chat app that uses Retrieval‑Augmented Generation to answer questions based on your own data, which is a low‑cost way to apply LLMs in a business context.
- Streamlit is used for the UI, with chat input and message components, and a session‑state variable is created to store and display the full conversation history.
- LangChain provides the core LLM integration; an IBM Watsonx LLM (Llama‑2‑70B‑Chat) is accessed via an API key and project ID, with decoding parameters set for the model.
- The app sends user prompts to the LLM, captures the assistant’s responses, and renders both sides of the dialogue as markdown within the Streamlit chat interface.
- A PDF loader is added in a later step, converting the document into chunks, embedding them, and indexing them in a LangChain vector store so the model can retrieve and use custom data during chat.
Full Transcript
# Build a Retrieval‑Augmented Chat App **Source:** [https://www.youtube.com/watch?v=XctooiH0moI](https://www.youtube.com/watch?v=XctooiH0moI) **Duration:** 00:02:45 ## Summary - The video demonstrates building a chat app that uses Retrieval‑Augmented Generation to answer questions based on your own data, which is a low‑cost way to apply LLMs in a business context. - Streamlit is used for the UI, with chat input and message components, and a session‑state variable is created to store and display the full conversation history. - LangChain provides the core LLM integration; an IBM Watsonx LLM (Llama‑2‑70B‑Chat) is accessed via an API key and project ID, with decoding parameters set for the model. - The app sends user prompts to the LLM, captures the assistant’s responses, and renders both sides of the dialogue as markdown within the Streamlit chat interface. - A PDF loader is added in a later step, converting the document into chunks, embedding them, and indexing them in a LangChain vector store so the model can retrieve and use custom data during chat. ## Sections - [00:00:00](https://www.youtube.com/watch?v=XctooiH0moI&t=0s) **Building a Retrieval-Augmented Chat App** - The speaker demonstrates how to create a Streamlit‑based interface using LangChain that employs retrieval‑augmented generation to let users chat with their own data, covering dependency setup, UI components, and handling message history through session state. ## Full Transcript
in this video I'm going to show you how
to build a large language model app to
chat with your own data this is arguably
the cheapest and the most efficient way
to get started with llms for your own
business but before we do that I want to
back up a little the technique that
makes this work is called retrieval
augmented generation a fancy way of
saying we Chuck in chunks of your data
into a prompt and get the llm to answer
based on that context the first thing
that we need to do is build an app to
chat to there's a bunch of dependencies
that I need to import they're mainly
from Lang chain but there's a little
streamlit and what's an next thring for
good measure I'll explain these as I use
them so don't stress for now streamlit
has a great set of chart component so
I'm going to make the most of them add
in a chat input component to hold the
prompt and then display the user message
using the chat message component via
markdown this means I can now see the
messages showing up in the app but it's
only displaying the last message posted
not all of them easy fix create a
streamlet state variable I'll call it
message and append all of the user
prompts into it while add it I'll save
the roll type in this case user into the
dictionary and then I can test it out hm
but the history doesn't show up Well
turns out I haven't printed out the
historical messages yet Lop through all
the messages in the session State
message variable and use the chat
message component to display them and
wait did I save the app of course not
I'd never make a mistake like that let's
just try that again and look at this
historical prompts that have been passed
through whoopy Doo Nick where's the L
well let's do it I'm going to use the
Lang chain interface to what to next. a
why well it uses state-of thee out large
language models doesn't use your data to
train and it's built for business but
that's just scraping the surface to do
that I'll create a credentials
dictionary with an API key and use the
ml service URL you can create an API key
from the IBM Cloud am menu URLs for
different regions are shown on the
screen right now then the llm I'm using
llama 270b chat because I'm pretty fond
of those furry buggers pass through some
decoding parameters and specify the
project ID from what to next now send
the prom through to the llm and boom
wait it looks like it's running but I
need to show the llm responses as well
easy enough with the streamlit chat
message component note the chat role for
the llm response is assistant rather
than user this helps to differentiate
the responses I'll render The Prompt as
markdown and save the message to the
session State as well that way the
history is displayed in the app and now
it works yeah yeah that's great but
where's the custom data come into plate
entering phase 3 I'll add in a load PDF
function and specify the name of the PDF
here then pass that to the Lang chain
Vector store index Creator and choose
the embeddings function to use this
basically chunks up the PDF and loads it
into a vector database chroma DB in this
case wrapping it in the st. case
resource function means that streamlet
doesn't need to reload it each time
which makes it a whole heat faster
anyway I can then pass that index to the
Lang chain retriever QA chain and swap
out the base llm for the Q&A chain using
chain. run and we can now chat with our
PDF in this case a PDF to do with
generative AI meta I know