Learning Library

← Back to Library

Three Methods to Boost LLM Answers

12m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

Asking a large language model “who is Martin Keen?” yields wildly different answers because each model has distinct training data and knowledge cut‑off dates.
Model answers can be improved in three ways: (1) Retrieval‑Augmented Generation (RAG) that fetches up‑to‑date external data, (2) fine‑tuning the model on domain‑specific transcripts, and (3) better prompt engineering to clarify the exact individual you’re asking about.
RAG works in three stages—retrieval of relevant documents, augmentation of the original prompt with the retrieved content, and generation of a response using this enriched context.
Unlike simple keyword search, RAG converts both the query and the documents into vector embeddings—numeric representations of meaning—so it can surface semantically similar information even when wording differs.
This approach is especially useful for corporate knowledge bases, allowing the model to answer questions (e.g., “What was our revenue growth last quarter?”) by pulling relevant internal PDFs, spreadsheets, or wiki pages and integrating them into its reply.

Sections

Full Transcript

# Three Methods to Boost LLM Answers **Source:** [https://www.youtube.com/watch?v=zYGDpG-pTho](https://www.youtube.com/watch?v=zYGDpG-pTho) **Duration:** 00:12:57 ## Summary - Asking a large language model “who is Martin Keen?” yields wildly different answers because each model has distinct training data and knowledge cut‑off dates. - Model answers can be improved in three ways: (1) Retrieval‑Augmented Generation (RAG) that fetches up‑to‑date external data, (2) fine‑tuning the model on domain‑specific transcripts, and (3) better prompt engineering to clarify the exact individual you’re asking about. - RAG works in three stages—retrieval of relevant documents, augmentation of the original prompt with the retrieved content, and generation of a response using this enriched context. - Unlike simple keyword search, RAG converts both the query and the documents into vector embeddings—numeric representations of meaning—so it can surface semantically similar information even when wording differs. - This approach is especially useful for corporate knowledge bases, allowing the model to answer questions (e.g., “What was our revenue growth last quarter?”) by pulling relevant internal PDFs, spreadsheets, or wiki pages and integrating them into its reply. ## Sections - [00:00:00](https://www.youtube.com/watch?v=zYGDpG-pTho&t=0s) **Improving Personal Queries with LLMs** - The speaker explains how large language models give inconsistent answers about an individual due to differing training data and outlines three methods—retrieval‑augmented generation, fine‑tuning on specialized data, and crafting clearer prompts—to obtain more accurate, up‑to‑date information. - [00:03:17](https://www.youtube.com/watch?v=zYGDpG-pTho&t=197s) **Semantic Retrieval with Vector Embeddings** - The speaker explains how RAG converts text into numerical vectors to locate semantically similar documents, injects that context into a language model for accurate, up‑to‑date answers, and highlights the added latency and processing costs. - [00:06:28](https://www.youtube.com/watch?v=zYGDpG-pTho&t=388s) **Fine‑Tuning: Benefits and Trade‑offs** - The speaker explains that supervised fine‑tuning embeds domain‑specific expertise and speeds up inference compared to RAG, yet it demands thousands of high‑quality examples, substantial GPU compute, and ongoing maintenance. - [00:09:38](https://www.youtube.com/watch?v=zYGDpG-pTho&t=578s) **Prompt Engineering Unlocks Model Potential** - The speaker explains how well-crafted prompts can activate a model’s existing knowledge to improve results without retraining, highlighting immediate benefits and inherent limitations. - [00:12:46](https://www.youtube.com/watch?v=zYGDpG-pTho&t=766s) **Choosing Methods Beyond Vanity Search** - The speaker emphasizes selecting approaches that suit you, noting how practices have progressed far beyond simple Google vanity searches. ## Full Transcript

0:00Remember how back in the day people would 0:03Google themselves, you type your name into a search engine and you see what it knows about you? 0:08Well, the modern equivalent of that is to do the same thing with a chatbot. 0:13So when I ask a large language model, who is Martin Keen? 0:18Well, the response varies greatly depending upon which model I'm asking, 0:22because different models, they have different training data sets, they have a different knowledge cutoff dates. 0:28So what a given model knows about me, well, it differs greatly. 0:32But how could we improve the model's answer? 0:36Well, there's three ways. 0:38So let's start with a model here, and we're gonna see how we can improve its responses. 0:44Well, the first thing it could do is it could go out and it could perform a search, 0:51a search for new data that either wasn't in its training data set, 0:54or it was just data that became available after the model finished training, 0:58and then it could incorporate those results from the search back into its answer. 1:03That is called RAG or Retrieval Augmented Generation. 1:11That's one method. 1:12Or we could pick a specialized model, a model that's been trained on, let's say, transcripts of these videos. 1:21That would be an example of something called fine tuning, 1:29or we could ask the model a query that better specifies what we're looking for. 1:36So maybe the LLM already knows plenty about the Martin Keens of the world, 1:41but let's tell the model that we're referring to the Martin keen who works at IBM, 1:45rather than the Martin Keen that founded Keen Shoes. 1:50That is an example of prompt engineering. 1:55Three ways to get better outputs out of large language models, each with their pluses and minuses. 2:03Let's start with RAG. 2:05So let's break it down. 2:06First there's retrieval. 2:08So retrieval of external up-to-date information. 2:12Then there's augmentation. 2:14That's augmentation of the original prompt with the retrieved information added in. 2:19And then finally there's generation. 2:22That's generation of a response based on all of this enriched context. 2:27So we can think of it like this. 2:30So we start with a query and the query comes in to a large language model. 2:40Now, what RAG is gonna do is it's first going to go searching through a corpus of information. 2:48So we have this corpus here full of some sort of data. 2:53Now, perhaps, that's your organization's documents. 2:56So it might be spreadsheets, PDFs, internal wikis, you know, stuff like that, 3:01But unlike a typical search engine that just matches keywords, 3:07RAG converts both your question, the query, and all of the documents into something called vector embeddings. 3:18So these are all converted into vectors. 3:20essentially turning words and phrases into long lists of numbers that capture their meaning. 3:27So when you ask a query like, what was our company's revenue growth last quarter? 3:34Well, RAG will find documents that are mathematically similar in meaning to your question, 3:38even if they don't use the exact same words. 3:41So it might find documents mentioning fourth quarter performance or quarterly sales. 3:48Those don't contain the keyword revenue growth, but they are semantically similar. 3:54Now, once RAG finds the relevant information, it adds this information 3:59back into your original query before passing it to the language model. 4:06So instead of the model just kind of guessing based on its training data, 4:09it can now generate a response that incorporates your actual facts and figures. 4:15So this makes RAG particularly valuable when you are looking for information that is up to date, 4:24and it's also very valuable when you need in to add in information that is domain specific as well, 4:34but there are some costs to this. 4:38Let's go with the red pen. 4:40So one cost, that would be the cost of performance. 4:45for performing all of this, because you have this retrieval step here, and that 4:50adds latency to each query compared to a simple prompt to a model. 4:55There are also costs related to just kind of the processing of this as well. 5:01So if we think about what we're having to do here, we've got documents that need to be vector embeddings, 5:07and we need to store these vector embedding in a database. 5:11All of this adds to processing costs, it adds to infrastructure costs 5:15to make this solution work. 5:17All right, next up, fine tuning. 5:20So remember how we discussed getting better answers about me by 5:24training a model specifically on, let's say, my video transcripts. 5:26Well, that is fine tuning in action. 5:30So what we do with fine tuning is we take a model, but specifically an existing model. 5:40and that existing model has broad knowledge. 5:44And then we're gonna give it additional specialized training on a focused data set. 5:51So this is now specialized to what we want to develop particular expertise on. 5:58Now, during fine tuning, we're updating the model's internal parameters through additional training. 6:05So the model starts out with some weights here. 6:10like this, and those weights were optimized during its initial pre-training. 6:16And as we fine tune, we're making small adjustments here to the model's weights using this specialized data set. 6:26So this is being incorporated. 6:29Now this process typically uses supervised learning where we provide input-output 6:34pairs that demonstrate the kind of responses we want. 6:37So for example, if we're fine-tuning for technical support, we might provide thousands of examples of customer queries, 6:46and those would be paired with correct technical responses. 6:50The model adjusts its weights through back propagation 6:53to minimize the difference between its predicted outputs and the targeted responses. 6:58So we're not just teaching the model new facts here, we're actually modifying how it processes information. 7:06The model is learning to recognize domain-specific patterns. 7:11So, fine-tuning shows its strength when you particularly need a model that has very deep domain expertise. 7:22That's what we can really add in with fine tuning, 7:25and also, it's much faster, specifically at inference time. 7:31So when we are putting the queries in, it's faster than RAG because it doesn't need to search through external data, 7:38and because the knowledge is kind of baked into the model's weights, you don't need to maintain a separate vector database, 7:43but there's some downsides as well. 7:46Well, there's certainly issues here with the training complexity of all of this. 7:54You're going to need thousands of high quality training examples. 7:59There are also issues with computational cost. 8:05The computational cost for training this model can be substantial and is going to require a whole bunch of GPUs. 8:12And there's also challenges related to maintenance as well 8:17because unlike RAG where you can easily add new documents to your knowledge base at any point. 8:22Updating a fine-tune model requires another round of training 8:27and then perhaps most importantly of all there is a risk of something called catastrophic forgetting. 8:37Now that's when the model loses some of its general capabilities while it's busy learning these specialized ones. 8:44So finally let's explore prompt engineering. 8:48Now specifying Martin Keen who works at IBM versus 8:52Martin Keene who founded Keene Shoes, that's prompt engineering, but at its most basic. 8:57Prompt engineering goes far beyond simple clarification. 9:01So let's think about when we input a prompt, the model receives this prompt and it processes it through a series of layers, 9:16and these layers are essentially tension mechanisms and each one 9:21focuses on different aspects of your prompt text that came in. 9:25And by including specific elements in your prompt, so examples or context or how you want the format to look, 9:32you're directing the model's attention to relevant patterns it learned during training. 9:38So for example, telling a model to think about this step-by-step, 9:42that activates patterns it learnt from training data where methodical reasoning led to accurate results. 9:49So a well-engineered prompt can transform a model's output without any additional training or without data retrieval. 9:59So take an example of a prompt. 10:02Let's say we say, is this code secure? 10:06Not a very good prompt. 10:08An engineered prompt, it might read a bit more like this. 10:12It's much more detailed. 10:13Now. 10:14We haven't changed the model, we haven't added new data, we've just better activated its existing capabilities. 10:23Now I think the benefits to this are pretty obvious. 10:26One is that we don't need to change any of our back-end infrastructure here 10:32because there are no infrastructure changes at all in order to prompt better, it's all on the user. 10:39There's also the benefit that by doing this, You get to see immediate responses and immediate results to what you do. 10:50We don't have to add in new training data or any kind of data processing, 10:53but of course there are some limitations to this as well. 10:58Prompt engineering is as much an art as it is a science. 11:01So there is certainly a good amount of trial and error in this sort of process to find effective prompts, 11:10and you're also limited in what you can do here, you're limited 11:15to existing knowledge because you're not able to actually add anything else in here. 11:23No additional amount of prompt engineering is going to teach it truly new information. 11:28You're not going to the model anything that's outdated in the model. 11:33So we've talked about now RAG as being one option and we talked about fine tuning as being another one. 11:44And now, just now, we've talked about prompt engineering as well 11:51and I've really talked about those as three different distinct things here, 11:57but they're commonly used actually in combination. 12:02We might use all three together. 12:05So consider a legal AI system. 12:07RAG, that could retrieve specific cases and recent court decisions. 12:12The prompt engineering part, that could make sure that we follow proper legal document formats by asking for it. 12:19And then fine-tuning, that can help the model master firm-specific policies. 12:24I mean, basically, we can think of it like this. 12:27We can think that prompt engineering offers flexibility and immediate results, but it can't extend knowledge. 12:34RAG, that can extend knowledge, it provides up-to-date information, but with computational overhead. 12:39and then fine-tuning, 12:41that enables deep domain expertise, but it requires significant resources and maintenance. 12:47Basically, it comes down to picking the methods that work for you. 12:52You know, we've, we sure come a long way from vanity searching on Google.