Learning Library

← Back to Library

Top Three Retrieval Strategies in RAG

Key Points

  • Retrieval‑augmented generation (RAG) hinges on the retrieval component, whose choice dramatically affects the factuality and relevance of an LLM’s answers.
  • Sparse retrieval (e.g., TF‑IDF, BM25) is a classic, fast, and scalable keyword‑based method that excels when exact wording matters but struggles with synonyms and contextual meaning.
  • Dense (semantic) retrieval maps queries and documents into high‑dimensional embeddings, enabling meaning‑based matching via vector similarity; it relies on embedding models like Sentence‑Transformers and approximate‑nearest‑neighbor search.
  • While each approach has trade‑offs, many teams now favor hybrid or newer strategies that combine the speed of sparse methods with the flexibility of dense embeddings to handle diverse queries and multimodal data.

Full Transcript

# Top Three Retrieval Strategies in RAG **Source:** [https://www.youtube.com/watch?v=r0Dciuq0knU](https://www.youtube.com/watch?v=r0Dciuq0knU) **Duration:** 00:08:16 ## Summary - Retrieval‑augmented generation (RAG) hinges on the retrieval component, whose choice dramatically affects the factuality and relevance of an LLM’s answers. - Sparse retrieval (e.g., TF‑IDF, BM25) is a classic, fast, and scalable keyword‑based method that excels when exact wording matters but struggles with synonyms and contextual meaning. - Dense (semantic) retrieval maps queries and documents into high‑dimensional embeddings, enabling meaning‑based matching via vector similarity; it relies on embedding models like Sentence‑Transformers and approximate‑nearest‑neighbor search. - While each approach has trade‑offs, many teams now favor hybrid or newer strategies that combine the speed of sparse methods with the flexibility of dense embeddings to handle diverse queries and multimodal data. ## Sections - [00:00:00](https://www.youtube.com/watch?v=r0Dciuq0knU&t=0s) **Retrieval Strategies in RAG** - The speaker likens the multiple variations of a name to the diverse retrieval methods in Retrieval‑Augmented Generation, emphasizing that the choice of retrieval strategy—such as sparse keyword‑based search—is crucial for delivering factual, relevant answers from an LLM. - [00:03:11](https://www.youtube.com/watch?v=r0Dciuq0knU&t=191s) **Understanding Dense and Hybrid Retrieval** - The passage outlines modern information retrieval techniques, contrasting traditional BM25-based sparse search with newer dense semantic retrieval using embeddings and introducing hybrid retrieval as the emerging state‑of‑the‑art approach. - [00:07:07](https://www.youtube.com/watch?v=r0Dciuq0knU&t=427s) **Hybrid Retrieval for Specialized Domains** - The speaker emphasizes that hybrid retrieval—merging fast sparse methods with context-aware dense techniques—is the preferred RAG strategy for jargon‑heavy fields such as legal, technical, and medical, and is now readily supported by platforms like Elasticsearch, Milvus, Weaviate, and DataStax Astra DB. ## Full Transcript
0:00I've heard about five different variations of my first name Joseph, Joe, Joseph with an 0:07F, Jose from Mis Amigos en Guadalajara, and then, there's that one guy in grad school who 0:13called me Joseppi. Meanwhile, I don't think I've ever had anything similar with my last name, just 0:20Washington. The retrieval in retrieval augmented generation or RAG is kind of like 0:27that. We all agree on the augmented generation part of the name, but retrieval comes in 0:34multiple flavors, and the retrieval strategy you choose can make or break your AI agentic 0:40system. And this is key in any generic RAG system. You know where a user comes 0:47in and they have a query and they come to your application, 0:54which itself is connected to your LLM. And you want to 1:01provide that LLM access to different knowledge sources. 1:11RAG works by fetching relevant chunks from your knowledge base and feeding them into the LLM. The 1:18quality of that retrieval method, though, determines how factual and relevant the answers 1:23will be. More methods, or some methods, I should say, are lightning fast while others are more flexible 1:30when it comes to synonyms, context and data that spans different modalities. So 1:37let's count down the top three retrieval strategies and end with the one that most teams 1:42are betting on today. Okay. Starting with number three, sparse 1:48retrieval. This is a foundational 1:56classic method of retrieval. It's fairly old. It's about 50 2:03years old, relying on keywords or 2:09keyword search. Sparse retrieval uses methods like TF-IDF, 2:18the well-known, as well as BM25. 2:24It counts how often query terms appears in your document and then scores the documents 2:31accordingly. Its pros are it's simple, fast and scalable, 2:38but it doesn't handle synonyms or context very well. Still, in some cases, BM25 2:45can outperform more expensive deep learning models on domain-specific terms. Question: when 2:52should you use it? Any situation where exact wording matters, so short, 2:58well-defined queries, code, search logs or legal clauses are all 3:04examples. And it doesn't require embeddings. So it's cost-effective, and it scales really well. 3:11You're probably already using open-source examples like Elasticsearch and Apache Lucene, 3:17both built on BM25, and even Milvus now supports BM25, in 3:23addition to vector embeddings. Now on to number two. 3:31Dense retrieval aka or semantic 3:40workhorse. This technology is about 5 to 10 years old, 3:48so fairly recent. And in dense retrieval, both queries and documents are 3:55mapped into high-dimensional vector space. 4:02And results are found based on the semantic similarity, i.e., the meanings of the words instead 4:09of exact matches. So this depends on embedding models. And embedding model, like the open-source 4:15sentence transformers models, takes text and converts that into a vector 4:22of numbers. Texts with similar meaning land close together in that vector space, where 4:29similarity is calculated using algorithms like approximate nearest neighbor or 4:37k nearest neighbor. Open-source examples include files from Meta or 4:44JVector, which is an open-source, high-performance Java library that speeds up dense search 4:51or dense retrieval in enterprise RAG systems. Dense retrieval makes natural language queries 4:57shine. It's perfect for chatbots, customer service and research over unstructured knowledge bases 5:04where people might phrase things in many different ways. It's powerful and context-aware, 5:10but it can miss rare or jargon-heavy terms. It's also not good with short, few 5:17word queries. On to number one, hybrid 5:24retrieval, aka, the current state of the art. 5:34This one is the new kid on the block. It's only about 2 to 3 years old 5:41in practical deployments, and it combines the best of both worlds, vector plus 5:48keyword search. 5:56The semantic matching handles synonyms and concepts, while the keyword matching 6:02ensures that rare but critical terms don't get lost. Benchmarks 6:09show hybrid retrieval consistently outperforming dense only retrieval, 6:15boosting both precision and recall. So how does it work? The query runs both ways in 6:22parallel: once as a vector embedding against your embedded knowledge set and again as a keyword 6:28search. It then uses a fusion algorithm to merge results based on scores 6:35from both. The most common fusion algorithm is a weighted sum, so it 6:42picks a balance between, for example, 70% dense and 6:4830% sparse. Another very popular method is 6:54reciprocal ranked fusion, or RRF, which doesn't use raw scores 7:01but instead merges based on the ranked positions from each retriever. 7:08It works across use cases, but especially in domains with specialized jargon, such as 7:15legal or technical domains or medical, uh, medical field. 7:21Hybrid is number one because it balances speed, precision and recall. That's why 7:28it has become the default choice for serious RAG deployments, and also why offerings like 7:35Elasticsearch, Milvus, Weaviate and DataStax Astra DB have all made it easy to 7:41experiment with hybrid retrieval. For some of you, this may feel like a Taylor Swift Eras 7:48tour, but with retrieval strategies and with the eras spanning the last 50 years. 7:55If you're a data scientist or a developer, I encourage you to embrace the hybrid retrieval era. 8:01Because sparse retrieval is fast and exact, and dense retrieval is context-aware 8:08and flexible. But hybrid retrieval gives you the best of both worlds, and that 8:15is why it's top of the list.