Learning Library

← Back to Library

Improving AI Accuracy with Retrieval Augmentation

13m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

The speakers illustrate how AI can confidently give absurd, incorrect advice—like using industrial glue to keep pizza toppings in place—highlighting the risk of blindly trusting AI outputs.
They note that AI errors differ from human mistakes, often producing confident hallucinations that can mislead users when important decisions rely on AI advice.
To improve AI accuracy, they introduce Retrieval‑Augmented Generation (RAG), which supplements a language model’s knowledge with up‑to‑date, trusted information retrieved from external sources.
By querying a vector database for relevant documents and appending that context to the user’s prompt, RAG helps the model generate more reliable, factually correct responses.
The overall message is that integrating external, curated data via RAG can significantly reduce AI hallucinations and increase confidence in AI‑driven decisions.

Sections

Full Transcript

# Improving AI Accuracy with Retrieval Augmentation **Source:** [https://www.youtube.com/watch?v=pNbU1vGkIK4](https://www.youtube.com/watch?v=pNbU1vGkIK4) **Duration:** 00:13:59 ## Summary - The speakers illustrate how AI can confidently give absurd, incorrect advice—like using industrial glue to keep pizza toppings in place—highlighting the risk of blindly trusting AI outputs. - They note that AI errors differ from human mistakes, often producing confident hallucinations that can mislead users when important decisions rely on AI advice. - To improve AI accuracy, they introduce Retrieval‑Augmented Generation (RAG), which supplements a language model’s knowledge with up‑to‑date, trusted information retrieved from external sources. - By querying a vector database for relevant documents and appending that context to the user’s prompt, RAG helps the model generate more reliable, factually correct responses. - The overall message is that integrating external, curated data via RAG can significantly reduce AI hallucinations and increase confidence in AI‑driven decisions. ## Sections - [00:00:00](https://www.youtube.com/watch?v=pNbU1vGkIK4&t=0s) **When AI Gives Bad Advice** - A dialogue spotlights an AI's absurd recommendation to glue pizza toppings, underscores the confidence‑driven errors AI can make, and introduces retrieval‑augmented generation (RAG) as a technique to boost answer accuracy. - [00:03:09](https://www.youtube.com/watch?v=pNbU1vGkIK4&t=189s) **Choosing the Right AI Model** - The speaker explains that model size and domain‑specific training influence hallucination risk, noting that large, general models excel on broad questions, whereas smaller, specialized models provide more reliable answers within their expertise. - [00:06:13](https://www.youtube.com/watch?v=pNbU1vGkIK4&t=373s) **Chain‑of‑Thought and LLM Chaining** - The passage outlines zero‑shot and few‑shot chain‑of‑thought prompting techniques for improved reasoning, then introduces LLM chaining—using multiple models to reach a consensus—to enhance overall AI accuracy. - [00:09:17](https://www.youtube.com/watch?v=pNbU1vGkIK4&t=557s) **Mixture of Experts Routing** - The speaker explains how a gating network directs each query to specialized sub‑models (experts) such as math, law, or language, combines their outputs, and achieves higher accuracy and fewer errors compared to a single model. - [00:12:22](https://www.youtube.com/watch?v=pNbU1vGkIK4&t=742s) **Tailoring Model Settings for Use Cases** - The speaker explains how prompt type, temperature, system prompts, and reinforcement learning with human feedback work together to balance factual accuracy and creative variety in AI responses. ## Full Transcript

0:00Jeff, pepperonis keep falling off my pizza. 0:04Martin, that sounds like a personal problem to me, 0:06but don't worry, I have a solution. 0:08Just use this industrial strength glue and your problem will be solved. 0:12That sounds awful. 0:14Well that's what my AI chatbot recommended and we know AI is never wrong, right? 0:19Well actually, I am going to disagree you there. 0:23In fact, AI can make mistakes of this sort which are substantially different than those that a human would make. 0:29Right, AI can come up with things you and I would immediately dismiss and do it with a level of confidence that is absolutely stunning. 0:36Right, and if you ask it to try again, it'll often apologize but then come up with something completely different 0:42which kind of makes you wonder which parts to trust and which ones not to. 0:46So if we're going to depend on AI to advise us on important decisions, we need it to be correct. 0:53What can we do to improve the accuracy of AI? 0:55That is a good question. So let's take a look at some techniques that can do just that. 1:01So the first technique we're going to talk about is called RAG. 1:05I've got you covered here Martin. 1:08Yeah maybe not that rag. 1:11It's actually an acronym, retrieval augmented generation. 1:15Okay, so we're not gonna be using this after all. 1:17Not today. 1:18So this is all about adding in additional information into a large language model to help it be able to answer a question more accurately. 1:26So if you think about a large-language model today, it's trained on a certain data set of information. 1:30It knows what it knows from that training, 1:33but what if we have a user who comes in and asks a question that requires some information that's not in that training data set. 1:40So maybe it's something that's newer than, you know, it came out after the LLM was trained, or it's just something that wasn't trained on. 1:48The problem there is the LLM will still have a good guess as to how to answer your query, 1:54but it's quite likely it's going to create a hallucination and actually be wrong with low accuracy. 2:00That's how we end up with pizzas and glue. 2:02That's exactly it. 2:03So what we need to do is to introduce a trusted data source into this before the LLM sees the query. 2:11So in this case, we have a trusted data source, probably a vector database, 2:15and we can use a retriever, that's what the R really stands for in RAG, 2:20to query that vector database and to retrieve documents that will be relevant to that particular user's query, 2:28and then it can populate that into the query, before the large language model actually sees it. 2:33So in its context window now, it has the user's initial prompt, 2:38plus we have embellished that prompt with some additional relevant information in order for the LLM to be able to answer the question. 2:44You've augmented the query. 2:46Exactly, that's the A. 2:48Ah, there we go. 2:49That's why they put that in there. 2:50Yeah, and across the output of this is hopefully gonna be more accurate. 2:55That's the G, and this should now have a much better chance of being the right answer because we've given the LLM the additional information that it needs. 3:04Okay, Martin, another thing you can do to help improve AI accuracy is make sure you've got the right model. 3:10And the model size makes a difference and what it's trained on makes a different. 3:14It's all about picking the right tool. 3:16In this case, it's about picking the right models that's fit for purpose. 3:20For instance, a large model that knows many different domains will be less likely to hallucinate 3:25if the question is broad versus a smaller model with more specific information. 3:32Will be less likely to hallucinate if the question is within its area of expertise. 3:36Okay, so if you ask your medical doctor how to get rid of a virus on your computer, he or she may not know and might just try to guess. 3:46Yeah, but if you asked them a about how to treat a biological virus, they would have a much better answer. 3:51You certainly hope so. 3:53I would hope so, so for instance, let's take a look at an example with AI models. 3:57We would have this large model, it would be trained on lots of different domains of information, 4:01it might in fact know about medical information, it might know about law, it's trained in art, in sports, in technology, lots of things like that, 4:10versus a small model, smaller model, 4:13that is specifically trained, maybe just to know about cybersecurity knowledge. 4:18So if I were to ask a cybersecurity question of this larger model, 4:23then the likelihood that we're gonna get a good answer is a little bit lower because it has more area to hallucinate across. 4:31But if I ask this model that is specifically trained in that particular domain a question, probably I'm gonna get a really good answer. 4:39That makes sense, but what if I ask a more general question that's not necessarily cyber related? 4:46Then I would suspect that actually there's the chance that this is not going to give you such a good answer, 4:50whereas the general purpose model is more likely to be able to give a correct and accurate answer this time. 4:57Exactly, so you want to choose the right model for the right purpose. 5:01Alright, next up is COT, that's Chain of Thought Prompting. 5:07Now this involves asking the LLM to explicitly generate an intermediate reasoning steps before giving a final answer. 5:14And that can help reduce mistakes in problems where logical consistency is needed like a math problem. 5:20How do you feel about math problems, Jeff? 5:22I live for them. 5:23Okay, let's give you one. 5:25So consider we've got a factory that produces three times as many red widgets as it produces blue widgets. 5:31So if the factory produces 240 widgets, how many blue widgets are produced? 5:37Ok, this is easy, I'll play the useful idiot, it's 80. 5:42Uh, that is not the right answer. 5:45Uh... 5:46Okay so I guess i'm gonna have to show my work. 5:49We'll go ahead and put in an equation and we'll solve for B, 5:53where B is the number of blue widgets and if I solve for that I end up with okay sixty, final answer. And that is the correct answer. 6:03And what you've done here is you've gone through some reasoning steps to get to the answer, rather than picking the answer that you felt was intuitively correct. 6:11And that's how LLMs work as well. 6:13Sometimes the intuitive answer, the quick answer, is not the right one. 6:17So there's a few different ways to perform chain of thought processing. 6:21One way is to do it through something called zero-shot chain of thought processing, and this just simply adds a trigger phrase to the prompt. 6:29It might be something like, let's think step by step. 6:32And that will induce the model to produce a reasoning chain. 6:36There's also few-shot chain of thought prompting. 6:40Now this includes examples of questions in the prompt or examples of math problems in the prompts, and it includes the step-by-step solutions as well. 6:49So the model can learn from how you've done it and apply it to the next query. 6:54Then more recently, reasoning models have this chain of thought kind of built into them. 7:01So chain of thought is great for generating more accurate responses specifically when reasoning is needed, 7:09like this but it does little to improve accuracy of knowledge based answers. 7:15So now I know why my math teacher always made me show my work. 7:18Exactly. 7:19Okay, Martin, another technique that can improve AI accuracy is something called LLM chaining, 7:25and with LLM chaining, we're basically gonna get a consensus opinion. 7:29So we're gonna rely on more than just one intelligence. 7:32So let's put a few different, let's start with three LLMs. 7:36And instead of just relying on a single one of those LLMs, I'm gonna actually come in with a prompt to one of those. 7:43So I put my prompt into the first, and then I'm going to do this thing: 7:48R and R. 7:49Doesn't that sound great? 7:51Rest and relaxation. 7:52Yeah, no, slacker, I need you to focus on the problem. 7:55No, it's revise and reflect. 7:58So we're going to reflect the information that comes into this one and then send it on. 8:03Then it's gonna take that and revise it with its own understanding and reflect that on, 8:09and keep doing that until now we have had each one of these weigh in on this particular question. 8:14So I'm not relying on one, I've got the collective wisdom of all three in this example. 8:19Okay, so kind of a wisdom of the crowd things where you're just bringing in different experts and getting them to weigh in on there. 8:24Exactly, exactly right. 8:26And there's even a variation on this architecture, if you want to think of it this way. 8:30If I had one sort of supervisor, decider, 8:33then maybe instead of having it run through all three of them, 8:36the prompt comes in this way and it goes and asks each one of them individually 8:41and gets a response back and decides based upon those responses what to come out with as its answer. 8:48Yes, so the supervisor here is effectively telling the LLM to be its own critic 8:52because it's taking the responses and then critiquing them and deciding which ones to pick from each of these models to come up with the answer. 9:00Exactly. 9:00Another way of sort of crowdsourcing the answer, but in this case think of it sort of like a phone a friend. 9:07You've got three friends, you're going to ask them all the same question, and you're gonna decide which one of them you want to actually go with. 9:14Okay, any other ideas for improving AI accuracy? 9:17I've got another one. 9:19MoE, that's mixture of experts. 9:22Now this improves model accuracy by gathering together multiple opinions, 9:25a bit like LLM chaining, 9:27but this time rather than using multiple LLMs models, a mixture of expert is made up of a specialized set of sub-models, 9:34and each sub-model is considered an expert in a given specialization. 9:39So let's consider maybe for these sub-models here. 9:43And they're each good at different things. 9:45Maybe math is one of them, and law perhaps, and maybe one's really good at putting together language. 9:52Another one is trained on technology, for example. 9:56So when a user submits a prompt, we use something called a gating network, 10:02and that kind of acts like a router here to determine which expert should handle that input, 10:07and then the outputs are all combined to produce a response. 10:12So we might have a query that comes into the gating network here, and it turns out that query needs to use a little bit of math, which is going to help with some problem solving. 10:21Once we've got that, maybe we want to format a nicely written response. 10:26So we would use language there to format the grammatically correct response. 10:31Then we've divided this problem between many experts to be able to handle a wider range of patterns and complexities. 10:38Than maybe a standard model would be able to do. 10:41We've essentially reduced the errors here through specialization. 10:45And this sounds a lot like the LLM chaining that I referred to before, but there's one key difference, 10:51and that is with LLM chaining, we've got different LLMs and we're feeding it through. 10:57In this case, we have just one big LLM. 11:00So we're routing this within a single LLM. 11:03It's sort of like using, instead of three brains, we're using one brain, but we're just routing the question to a different lobe. 11:10That's exactly right. 11:13Another thing you can do, Martin, is to make your AI more accurate, is change the temperature setting of the model. 11:20Ah, so this is a thermometer, huh? 11:22It is a thermometer and while here we're showing what ambient room temperature might be, 11:27in an AI model the settings would look different. 11:30Right, so we might have a low temperature be 0.0, 0.5, maybe 1.0 and above. 11:37Exactly and this would be the more deterministic version and this would be them or creative version, 11:41and this would be the more creative version, so the higher up you go on this scale the more creative the answers are. 11:49So if I have more creative answers and more deterministic answers, what does that actually mean? 11:54So, think about the deterministic is going to be more factual, it's going to be more consistent, more predictable. 12:00It's going to be in some ways more reliable and times that that is exactly what we want. 12:06In other cases on the creative though, we might want something that's less predictable, something that is more varied, 12:12and in fact, good examples of this might be, am I asking a science question? 12:18Well, then I really want to be down here in the determinist version. 12:22I want the facts, 12:23but if I'm asking it to write lyrics for a song, okay, facts then might be pretty boring. 12:30So art or music, more creative activities, I might want something that's less predictable and more varied. 12:38Okay, so we've really got to pick the temperature setting based upon the type of prompt we're sending into it. 12:43Exactly, the use case will matter. 12:46All right, a couple of quick other ones that we should cover. 12:48System prompt is one. 12:49So this is a message that is included in every prompt, kind of secretly added in by the model. 12:55And it can determine how the model actually works. 12:58So you can put things in there that say, for example, you want to make sure that it provides accurate answers. 13:05You're gonna have different levels of success of that but that is something you can try. 13:10And you could put guardrails in so that you're able to guard against things like prompt injection attacks and that sort. 13:16Another technique is reinforcement learning with human feedback. 13:20In this case, basically the person is looking at the responses and saying, yes I agree, no I don't agree. 13:26So you give it thumbs up and thumbs down and we're kind of rewarding or disincentivizing it from doing those same answers again in the future. 13:35And that way we tune the model to get, again, a little more action. 13:39So those are our methods that we think are kind of primary ways to be able to improve AI accuracy. 13:45None of them are perfect and sometimes it takes a combination of all of them, 13:48but we would love to hear from you what are your thoughts on these methods or there are other methods that you would recommend as well? 13:54let us know and maybe we can talk about those in a future video 13:58Absolutely