Learning Library

← Back to Library

Librarian Analogy Explains Retrieval-Augmented Generation

Key Points

  • The journalist‑librarian analogy illustrates Retrieval‑Augmented Generation (RAG), where a language model (the journalist) relies on an expert data source (the librarian) to fetch relevant information.
  • In business contexts, the “user” can be a person, bot, or application posing queries that combine general language understanding with domain‑specific data, such as “What was revenue in Q1 for customers in the Northeast?”
  • Because detailed, time‑varying business facts aren’t encoded in a pre‑trained LLM, they must be retrieved from external sources like databases, PDFs, or other applications.
  • A vector database stores both structured and unstructured data as embeddings—mathematical vector representations—that are efficiently searchable by similarity.
  • The RAG workflow queries the vector store for relevant embeddings, feeds that retrieved context to the LLM, and then generates an answer that combines the model’s reasoning with up‑to‑date, domain‑specific information.

Full Transcript

# Librarian Analogy Explains Retrieval-Augmented Generation **Source:** [https://www.youtube.com/watch?v=qppV3n3YlF8](https://www.youtube.com/watch?v=qppV3n3YlF8) **Duration:** 00:07:53 ## Summary - The journalist‑librarian analogy illustrates Retrieval‑Augmented Generation (RAG), where a language model (the journalist) relies on an expert data source (the librarian) to fetch relevant information. - In business contexts, the “user” can be a person, bot, or application posing queries that combine general language understanding with domain‑specific data, such as “What was revenue in Q1 for customers in the Northeast?” - Because detailed, time‑varying business facts aren’t encoded in a pre‑trained LLM, they must be retrieved from external sources like databases, PDFs, or other applications. - A vector database stores both structured and unstructured data as embeddings—mathematical vector representations—that are efficiently searchable by similarity. - The RAG workflow queries the vector store for relevant embeddings, feeds that retrieved context to the LLM, and then generates an answer that combines the model’s reasoning with up‑to‑date, domain‑specific information. ## Sections - [00:00:00](https://www.youtube.com/watch?v=qppV3n3YlF8&t=0s) **Journalist‑Librarian Analogy for RAG** - The speaker likens a journalist consulting a librarian for relevant books to Retrieval‑Augmented Generation, illustrating how a user queries a system that pulls information from a vector database to answer specific business questions. ## Full Transcript
0:00so imagine you're a journalist and you 0:02want to write an article on a specific 0:06topic now you have a pretty good general 0:10idea about this topic but you'd like to 0:13do some more research so you go to your 0:17local 0:19library right now this 0:22library has thousands of books on 0:25multiple different topics but how do you 0:28know as the journalist which books are 0:31relevant for your topic well you go to 0:33the librarian now the librarian is the 0:36expert on what books contain which 0:38information in the library so our 0:41journalist queries the librarian to uh 0:44retrieve uh books on certain topics and 0:47the librarian uh produces those books 0:50and provides them back to the journalist 0:52now the librarian isn't the expert on 0:53writing the article and the journalist 0:56isn't the expert on finding the most 0:58upto-date and relevant information 1:00but with the combination of the two we 1:02can get the job 1:03done love this sounds like a lot like 1:05the process of rag or retrieval 1:09augmented generation where large 1:11language models call on Vector databases 1:13to provide key sources of data and 1:15information to answer a question H I'm 1:18not seeing the connection can you help 1:19me understand a little bit better 1:21sure so we have a 1:24user in your scenario it's that 1:28journalist 1:34and they have a 1:36question so what types of questions 1:39would you want to ask right maybe we can 1:41make this more of a business context 1:43yeah so let's say this is a business 1:44analyst and let's say they want to ask 1:47um what was Revenue in q1 from customers 1:50in the Northeast region right so that's 1:53your 1:56prompt okay so a couple questions on 1:58that user does it have to be a person or 2:01could it be something else too yeah so 2:03this doesn't necessarily have to be a 2:04user it could be a 2:07bot or it could be another 2:10application even the question that we're 2:12talking about what was our Revenue in q1 2:15from the Northeast you know the first 2:17part of that question it's pretty easy 2:19for you know a general llm to understand 2:22right what was our Revenue but it's that 2:24second part in q1 from customers in the 2:27Northeast that's not something that lln 2:29are trained on right it's very specific 2:31to our business and it changes over time 2:35so we have to treat those separately so 2:37how do we how do we uh manage that part 2:40of the request exactly you'll need 2:43multiple different sources of data 2:45potentially to answer a specific 2:47question right whether that's maybe a 2:49PDF or another business application or 2:54maybe some some images whatever that 2:57question is we need the appropriate data 2:59in order to provide the answer back what 3:02technology uh allows us to aggregate 3:04that data uh and use it for our llm yeah 3:08so we can take this data and we can put 3:09it into what we call a vector 3:14database a vector database is a 3:16mathematical representation of 3:18structured and unstructured data similar 3:21to what we might see in an array gotcha 3:24and and these arrays are uh better 3:27suited or easier to understand for 3:29machine learning or generative AI models 3:31versus just that uh underlying 3:33unstructured data exactly we query our 3:36Vector database right and we get back an 3:39embedding that uh includes uh the the 3:42relevant data for which uh we're 3:44prompting and then we includeed back 3:46into the original prompt right yeah 3:48exactly that feeds back into the prompt 3:51and then once we're at this point we 3:53move over to the other side of the 3:55equation which is the large language 3:57model gotcha so that that prompt that 4:00includes the vector embeddings now are 4:02fed into the large language model which 4:06then produces the 4:09output with the answer to our original 4:11question with sourced upto-date and 4:14accurate data exactly and that's a 4:17crucial aspect of it as new data comes 4:19into this Vector 4:21database or things that are updated back 4:24to your relevant question around 4:25performance in q1 as new data comes in 4:28those embeddings are updated ated so 4:30when that question's asked a second time 4:32we have more relevant data in order to 4:34provide back to the llm who then 4:37generates the output and the answer okay 4:39very cool so Sean this sounds a lot like 4:42my original analogy there with the 4:44librarian and our journalist right so 4:47the journalist trusts that the 4:49information in the library is accurate 4:52and correct now one of the challenges 4:54that I see is when I'm talking to 4:55Enterprise customers is they're 4:57concerned about deploying this kind of 4:59techn techology into customer facing 5:02business critical applications so if 5:04they're building applications taking 5:06customer orders processing refunds 5:09they're worried that uh uh these kinds 5:11of Technologies can produce 5:13hallucinations or inaccurate results 5:16right or perpetuate some kind of bias 5:18what are some things that uh can be done 5:20to help mitigate some of these concerns 5:23that brings up a great Point love right 5:24data that comes in on this side but also 5:27on this side is incredibly important 5:29into the output that we get when we go 5:31to make that prompt and get that answer 5:33back so it really is true garbage in and 5:36garbage out right so we need to make 5:38sure we have good data that comes into 5:39the vector database we need to make sure 5:41that data is clean governed and managed 5:44properly gotcha so what I'm hearing is 5:48that things like 5:50governance and data 5:54management are of course crucial to the 5:58vector database right so making sure 6:00that the actual information that's 6:02flowing through into the model such as 6:04the business results in the sample 6:06prompt we talked about is governed and 6:08clean but also crucially on the large 6:11language model side we need to make sure 6:14that we're not using a large language 6:17model that takes a blackbox approach 6:19right so a model where you don't 6:21actually know what is the underlying 6:23data that went into training it right 6:25you don't know if there's any 6:26intellectual property in there you don't 6:28know if there's inaccurate IES in there 6:30or you don't know if there are pieces of 6:33data that will end up perpetuating bias 6:35in your output results right so as a 6:38business and as as a business that's 6:40trying to uh uh manage and uphold their 6:44brain reputation it's absolutely 6:46critical to make sure that we're taking 6:48an approach that uh uh uses llms that 6:52are transparent in how they were trained 6:55and uh we can be 100% certain that there 6:58aren't any 7:00uh inaccuracies or data that's not 7:02supposed to be in there to be in there 7:05right yeah exactly it's incredibly 7:07important especially as a brand that we 7:09get the right answers we've seen the 7:10results of impact and especially back to 7:13our original question around what was 7:15our Revenue in q1 right we don't want 7:17that to be impacted by the results of a 7:20question that comes from you know that 7:22prompts one of our llms exactly exactly 7:25so very powerful technology but it makes 7:27me think back to the the library 7:30uh our journalists and librarian they 7:31both trust the data and the books that 7:33are in the library we have to have that 7:35same kind of confidence when we're 7:37building out these types of gender AI 7:39use cases for business as well exactly 7:41love so governance AI but also data and 7:45data management are incredibly important 7:47to this process we need all three in 7:49order to get the best result