Learning Library

← Back to Library

Prompt Engineering and Retrieval-Augmented Generation

Key Points

  • Prompt engineering has become a hot job market, with many openings for specialists who craft effective queries for large language models (LLMs).
  • It involves designing precise prompts to guide LLMs and minimize “hallucinations,” where models generate inaccurate or false information due to conflicting training data.
  • One key strategy is Retrieval‑Augmented Generation (RAG), which couples a retriever that fetches domain‑specific knowledge with the LLM generator to produce context‑aware answers.
  • The retriever can be as simple as a database or vector search, allowing the model to incorporate proprietary or industry‑specific information it otherwise wouldn’t know.
  • An illustrative use case is in finance: using RAG, a model can accurately answer questions about a company’s earnings for a given year by pulling the relevant data from a corporate knowledge base instead of relying on its generic training.

Full Transcript

# Prompt Engineering and Retrieval-Augmented Generation **Source:** [https://www.youtube.com/watch?v=1c9iyoVIwDs](https://www.youtube.com/watch?v=1c9iyoVIwDs) **Duration:** 00:12:43 ## Summary - Prompt engineering has become a hot job market, with many openings for specialists who craft effective queries for large language models (LLMs). - It involves designing precise prompts to guide LLMs and minimize “hallucinations,” where models generate inaccurate or false information due to conflicting training data. - One key strategy is Retrieval‑Augmented Generation (RAG), which couples a retriever that fetches domain‑specific knowledge with the LLM generator to produce context‑aware answers. - The retriever can be as simple as a database or vector search, allowing the model to incorporate proprietary or industry‑specific information it otherwise wouldn’t know. - An illustrative use case is in finance: using RAG, a model can accurately answer questions about a company’s earnings for a given year by pulling the relevant data from a corporate knowledge base instead of relying on its generic training. ## Sections - [00:00:00](https://www.youtube.com/watch?v=1c9iyoVIwDs&t=0s) **Prompt Engineering Overview & RAG** - The speakers introduce the surge in prompt‑engineer roles, define prompt engineering as crafting effective queries to avoid LLM hallucinations, outline common LLM applications, and preview retrieval‑augmented generation as a key strategy. ## Full Transcript
0:00so suj have you looked in your LinkedIn 0:02profile lately and noticed there are a 0:04ton of job openings for prompt Engineers 0:07absolutely and that's why today we're 0:09going to do a deep dive on what that is 0:12and but first to give a little context 0:14let's talk about what large language 0:16models are used to do for a review of 0:19course everyone is familiar with chat 0:20Bots and that's seen that all the time 0:23it's also used for well using summaries 0:26for example another common use case or 0:29information retrieve 0:30those are three different cases but for 0:32our viewers could you explain how that 0:35applies in prompt engineering sure 0:38prompt engineering is very vital in 0:40communicating effectively with large 0:42language models what does it mean it is 0:44designing coming up with the proper 0:47questions to get the responses you 0:49looking for from the large language 0:51model because you want to avoid 0:53hallucination right hallucinations are 0:55where you get essentially false results 0:58out of a large language models and 1:00that's because because the uh large 1:02language models are predominantly 1:04trained on the internet data and there 1:06could be conflicting data conflicting 1:08information and so on great okay I got 1:11that so we're going to look at this from 1:12four different approaches so let's get 1:14straight to it yep we're going to look 1:16at the first approach which is rag or 1:19retrieval augmented generation we've had 1:21videos about this already on the channel 1:23so I have kind of a basic understanding 1:24of it where you take domain specific 1:26knowledge and add it to your model but 1:28how does that actually work behind the 1:30scenes could you explain that to me 1:31absolutely so the larger language models 1:33as you know are trained on the internet 1:35data they are not aware of your domain 1:39specific knowledge base content at all 1:42so when you are quering the large 1:44language models you want to bring 1:46awareness of your knowledge base to the 1:49large language model So when you say 1:50knowledge base here you're referring to 1:52something that might be specific to my 1:54industry specific to my company which 1:55I'm going to then be applied to the 1:58model absolutely and so as that work 2:00again so to make this uh bring this 2:03awareness to the large language models 2:05we have to have two components one is 2:07the retriever component which brings the 2:09context of your domain knowledge base to 2:12the generator part of the large language 2:15model and when they work together and 2:18when you ask questions to the large 2:20language model it is now responding to 2:23your questions based on the domain 2:25specificity of your content okay I think 2:27I got it now this retriever that could 2:29be really as simple as a database search 2:31right exactly it can be a dector 2:33database okay I got that um but could 2:35you first kind of give me a quick 2:36example of how you seen that applied in 2:38an industry absolutely let's take the 2:41example of a financial uh information 2:43for a company right if you were to 2:46directly asking a question through the 2:48large language model about the total 2:50earnings of a company for a specific 2:52year it's going to go through its 2:54learning and the internet data and come 2:56up with a number that may not be 2:59accurate right uh so for example the 3:02annual earnings it could come back with 3:03$19.5 billion and which may be totally 3:07incorrect whereas if you want to get the 3:10accurate responses then you bring the 3:12attention to the domain knowledge base 3:15and ask the same question then the large 3:18language model is going to refer to your 3:20knowledge base to bring that answer and 3:22this time it will be accurate say for 3:24example $5.4 billion I see because this 3:27is a trust and source that it can then 3:30integrate in with this larger model 3:32correct okay so now we're on to the 3:34second approach to prompt engineering C 3:37or Chain of Thought and I I sometimes 3:40think of this as the old saying explain 3:43it to me like I'm an eight-year-old but 3:45could you give me more a practical 3:47explanation what that really means 3:49absolutely I think uh the large language 3:52models like in 8-year-old also need 3:55guidance on how to arrive at those 3:57responses right and before I jump to the 4:00um Chain of Thought approach I want to 4:04um recommend something right anytime you 4:06are working with the large language 4:08models consider two things the number 4:11one is the rag approach content 4:14grounding content ground your uh large 4:17language model right and then take the 4:20approach of promting it guiding the 4:24model through the prompts to get the 4:26responses that you need and cart belongs 4:28in that category as well these other 4:31three absolutely so let's talk about 4:33Chain of Thought right Chain of Thought 4:36is all about taking a bigger task of 4:40arriving at a response breaking it down 4:42into multiple sections and then 4:45combining the results of all those 4:46multiple sections and coming up with the 4:48final 4:49answer so instead of asking a large 4:53language model what is the um total 4:57earnings of a company in 2022 which it 4:59will give you just a BL blur of a number 5:02like $5.4 million you can actually as a 5:05large language model give me the uh 5:08total earnings of a company in 2022 for 5:12software for hardware and uh for uh 5:16Consulting say for example I see so 5:18you're asking to be more precise with 5:19the idea that you'll be able to get 5:21individual results that will ultimately 5:23combine combine it I see so for example 5:26you cited we'll just make up some 5:27numbers if I had five 5:30and then continue the rest of and three 5:32for example and the final answer will be 5:355 + 2 + 3 that will be the output but 5:38the large language model is now arriving 5:41at this number uh through reasoning and 5:45through 5:46explainability the was these was three 5:48separate queries essentially three 5:49separate problems so the way I tell the 5:51large language model is I give the 5:53problem and I explain it on how I will 5:56break down the problem so for example I 5:58say what what is the uh total earnings 6:01of a company and the if the total 6:03earnings of a company for software is 6:05five for Hardware it is two for 6:07Consulting it is three then the total 6:09earnings is 5 plus 2 plus 3 let me see 6:11if I can net that out to make sure I got 6:13it so in rag we were talking about being 6:16able to essentially improve based on 6:18domain knowledge but then to improve on 6:21the results that that generates we then 6:24apply this technique the explain it to 6:26an 8-year-old technique which then makes 6:28the result even better mhm okay that was 6:31Chain of Thought which as I understand 6:33is a few shot prompt technique where you 6:36basically provide some examples to 6:37improve the end result and I think the 6:40react is kind of the same genre but it's 6:42a little bit different could you explain 6:44to me the difference absolutely so react 6:46is also a few short pting technique U 6:50but it is different than the Chain of 6:51Thought in Chain of Thought you are 6:53going breaking down the steps of 6:55arriving at the response right so you 6:57were reasoning through the steps and 7:00arriving at the response whereas react 7:02is goes one step further it not only re 7:06reasoning with that but acting based off 7:09of what else is necessary to arrive at 7:12the response so this data though is 7:14coming from different sources we weren't 7:16talking about that in the latter case 7:17with k f and they are so they are for 7:21example you have a situation where you 7:24have your content the the domain content 7:27in your private database knowledge base 7:29right but you are asking a promp where 7:32you question is demanding responses that 7:35are not already available in your 7:37knowledge base then the react approach 7:40has the ability to actually go into a 7:44private a public knowledge base and 7:48gather both the information and arrive 7:49at the response so the action part of 7:52the react is its ability to go to the 7:55external resources to gain additional 7:58information to rual responses I got it I 8:02got it but there's one thing that's 8:03confused me just a teeny bit is that in 8:05rag that looks awfully similar but 8:08they're not the same where's the 8:09difference here so the difference is uh 8:12they both are using the private uh 8:14databases right knowledge basis but in 8:18large language models I want you to 8:20think about two steps right one is 8:22content grounding that's what rag is 8:25doing it is making you large language 8:27model aware of your main content where 8:31react is different is it has the ability 8:33to go to the public resources public 8:36content and knowled knowledge basis to 8:38bring additional information to complete 8:40the task okay uh before we wrap can you 8:42give me an example of react absolutely 8:46so let's go back to the financial 8:48example you were looking at in the 8:49previous uh patterns we were looking at 8:52the total earnings of a company for a 8:55specific year now supposing you come 8:57back with a prompt where you were ask 8:59asking for the total learnings of 2010 9:02and 9:032022 right your 2022 is information is 9:07here in your private database knowledge 9:10base but 2010 information is not there 9:13for example it's over here in the public 9:15one exactly so the large language model 9:19in the react approach now take takes the 9:24extern takes to the external resources 9:26to get that information for 2010 and 9:29then brings both of them and does the 9:31observation I see so that's going to 9:34produce a result that takes into 9:35consideration this whereas before it 9:37might have produced essentially a 9:38hallucination hallucination and a couple 9:41of more things right the react gives you 9:43the results in a three-step process 9:46right when you are asking the prompts in 9:48a react mode you have to first of all 9:51split that prompt into three steps one 9:54is the thought right what are you 9:56looking for and the second one is action 9:59what are you getting uh from where right 10:02and the third one finally is the 10:04observation that is the summary of the 10:06action that is taking place so for 10:09example thought one will be retrieve the 10:13total earnings for 10:142022 right and the thought so action one 10:18will be it will actually go to the 10:20knowledge base to retrieve 2022 and 10:23observation will be 2022 value now 10:26thought two is ret the value for 2010 10:30from a an external knowledge base and 10:33have that value there and observation 10:35two will have that value and the part 10:38three will be comparing them to arrive 10:41at which is a better Total Learning for 10:44you I think I've got it that's great we 10:46only have one more to go if you really 10:48want to impress your colleagues you want 10:50to learn about this next one which is 10:52directional stimulus prompting or DSP 10:55different from the other ones and how so 10:58DSP is a fun way uh and a brand new one 11:01that I want to introduce to the audience 11:03of uh uh making the large language 11:06models give specific information giving 11:09it a direction to give specific 11:11information from the task so for example 11:14you ask a question and say for example 11:18what is 11:19the um annual earnings of a company uh 11:24but then you want don't want the final 11:26number but you want specific details 11:28about learnings for say software or for 11:32Consulting so you give a hint and say 11:34software and Consulting and the large 11:37language model first of all we'll get 11:39the earnings and then from that extract 11:42specific values for software and 11:43Consulting this kind of reminds me of 11:45the game where you're trying to get 11:46someone to draw a picture and what do 11:48you do you provide a hint and in effect 11:51this provides you a better result in the 11:52same fashion absolutely so it is a very 11:54simple technique but it works very very 11:57well when you are looking for specific 11:59values from the task so try it out well 12:03thanks Su I Now understand what DSP is 12:05but could you kind of net out how do you 12:07combine these different techniques um 12:09you you should always start with rag to 12:12bring Focus to your domain content but 12:14you can also combine cot and react you 12:17can also combine Rag and DSP to get that 12:20cumulative effort uh effect excellent 12:23okay well thank you very much I hope you 12:25come back for another episode in promptu 12:28absolutely thank you 12:29Dan thank you for watching before you 12:32leave please click subscribe and 12:40like