Retrieval-Augmented Fine Tuning Explained
Key Points
- Retrieval‑augmented fine‑tuning (RAF) merges the strengths of traditional retrieval‑augmented generation (RAG) and fine‑tuning to better handle domain‑specific data.
- Developed by UC Berkeley researchers, RAF fine‑tunes a model to learn how to locate and use external documents during inference, improving RAG performance in specialized settings.
- The method is likened to an open‑book exam where the student has also studied the material: unlike pure fine‑tuning (closed‑book) or pure RAG (untrained open‑book), RAF equips the model with both memorized knowledge and effective retrieval skills.
- RAF’s training data consist of triples—query, a set of relevant documents, and the correct answer—so the model learns to “fish” for information and generate accurate responses.
- By teaching the model how to retrieve and synthesize external content, RAF provides a durable, scalable solution for enterprise‑level LLM applications.
Sections
- Hybrid Retrieval‑Augmented Fine‑Tuning - The passage explains how RAF merges inference‑time document retrieval with training‑time knowledge embedding to boost LLM performance on specialized tasks, using a closed‑book exam analogy.
- Teaching Models to Fish - The speaker outlines the RAFT training method, which pairs queries with mixed sets of relevant (core) and irrelevant (tangent) documents to train a model to retrieve, filter out off‑topic information, and generate answers using chain‑of‑thought reasoning.
- Chain-of-Thought Guidance Enhances Model Transparency - The speaker explains that using chain‑of‑thought reasoning with explicit document citations improves a model’s scalability, robustness, and traceability for enterprise applications.
Full Transcript
# Retrieval-Augmented Fine Tuning Explained **Source:** [https://www.youtube.com/watch?v=rqyczEvh3D4](https://www.youtube.com/watch?v=rqyczEvh3D4) **Duration:** 00:06:53 ## Summary - Retrieval‑augmented fine‑tuning (RAF) merges the strengths of traditional retrieval‑augmented generation (RAG) and fine‑tuning to better handle domain‑specific data. - Developed by UC Berkeley researchers, RAF fine‑tunes a model to learn how to locate and use external documents during inference, improving RAG performance in specialized settings. - The method is likened to an open‑book exam where the student has also studied the material: unlike pure fine‑tuning (closed‑book) or pure RAG (untrained open‑book), RAF equips the model with both memorized knowledge and effective retrieval skills. - RAF’s training data consist of triples—query, a set of relevant documents, and the correct answer—so the model learns to “fish” for information and generate accurate responses. - By teaching the model how to retrieve and synthesize external content, RAF provides a durable, scalable solution for enterprise‑level LLM applications. ## Sections - [00:00:00](https://www.youtube.com/watch?v=rqyczEvh3D4&t=0s) **Hybrid Retrieval‑Augmented Fine‑Tuning** - The passage explains how RAF merges inference‑time document retrieval with training‑time knowledge embedding to boost LLM performance on specialized tasks, using a closed‑book exam analogy. - [00:03:11](https://www.youtube.com/watch?v=rqyczEvh3D4&t=191s) **Teaching Models to Fish** - The speaker outlines the RAFT training method, which pairs queries with mixed sets of relevant (core) and irrelevant (tangent) documents to train a model to retrieve, filter out off‑topic information, and generate answers using chain‑of‑thought reasoning. - [00:06:19](https://www.youtube.com/watch?v=rqyczEvh3D4&t=379s) **Chain-of-Thought Guidance Enhances Model Transparency** - The speaker explains that using chain‑of‑thought reasoning with explicit document citations improves a model’s scalability, robustness, and traceability for enterprise applications. ## Full Transcript
When building GNI applications, retrieval augmented generation is often contrasted with fine tuning as two separate techniques for incorporating domain-specific data into LLM output.
Retrieval augmented fine tuning is a hybrid approach that combines the best of both worlds and addresses many of the challenges surrounding LLN performance in specialized settings.
Originally developed by researchers at UC Berkeley.
RAF uses a unique fine-tuning technique to improve RAG performance in specific domain contexts.
Now with traditional rag, we provide context to the model during inference.
By using a retriever to search for relevant documents in a vector database that we append to our prompt that we send to our LLM.
With fine tuning, we provide context to the model during training time by using a large label data set to bake specific knowledge into a pre-trained LLM.
So how can we combine both of these techniques to create retrieval-augmented fine tuning?
Let's use an analogy.
Let's say that using an LLM on enterprise-specific tasks is like studying for an exam.
Suppose that fine-tuning is like studying for a closed book exam.
Since you can't use your notes, you have to memorize all the materials in advance.
And if you study all the wrong stuff, you probably won't do so well since you don't have access to new information.
In the same way, with fine-tuning, the model has to rely completely on the knowledge it learned during training in order to answer the user's question.
Now, RAG would be like taking an open book exam.
That you did not study for.
Because you knew you could use the book on exam day, you chose to skip all the lectures and not read the textbook.
So on test day, even though you have all the materials in front of you, there's still no guarantee that you'll actually be able to know where to find all the information.
In the same way with RAG, the performance of the model is largely dictated by how well the retriever can pull relevant documents from the database.
Now, with Raft, this is like...
Taking an open book exam that you did study for.
This is the win-win situation, where you paid attention in all the lectures, read all the materials, and get to use the book on the test.
So RAF is similar in that it teaches the model how to use RAG, or how to external documents to generate an answer.
It's like the saying that goes, give a man a fish, and you feed him for a day.
But teach a man to fish, and you feed him, for a lifetime.
In the same way, RAF- essentially teaches the model how to fish or how to look for and generate an answer versus just giving it fish or giving it an answer.
To explain this more, let's dive into the implementation.
Since Raft is a training technique, we need training data.
Each data point will consist of three things, a query, a set of documents, and an answer Let's look at an example.
Let's say our query is how much parental leave does IBM offer?
To generate an answer, we can search through two types of documents, core documents and tangent documents.
Core documents contain information that's relevant to the user query.
In our example, these could be documents on, say, pay leave or benefit eligibility.
Tension documents, on the other hand, contain information, that's irrelevant or off-topic to the use of your query.
These could be document on, retirement accounts or internal code documentation.
From here, we create two types of document sets.
Set one.
Contains both core and tangent documents, and set to contains just tangent documents.
The reason why we include both is to simulate a real RAG use case where the retriever may or may not pull any relevant documents from the database.
Finally, to generate our answer, we use chain of thought reasoning.
To teach the model how to filter past tangent documents and focus on and process through core ones step by step in order to generate a correct.
We can use this framework to create a larger training data set that we can use to train the model using supervised fine tuning.
Now, because this framework is so adaptable, we can a wide variety of different models and fine tuning techniques to actually implement this in practice.
And with that, our model is now ready to ace the exam.
So there are three aspects of this training process that I want to highlight that are key to making this whole thing work.
One, the inclusion of tangent documents helps to teach the model how to pick out relevant documents from irrelevant ones, thus helping to increase accuracy on domain-specific questions.
Secondly, the creation of document sets that don't include any relevant documents at all, AKA set two.
Help to teach the model when to rely on its intrinsic knowledge or to say, I don't know, versus forcing an incorrect answer out of irrelevant rag documents.
This helps to minimize hallucinations.
Guiding the model using chain of thought reasoning helps to minimize overfitting and increase transparency and traceability by encouraging the model to quote specific documents from which it got the answer from.
So as you can see, Raft creates a model that's both highly scalable and highly robust for enterprise tasks.
So whether you found this video because you're studying for that closed book exam or you're just curious about AI, I hope you learned something and enjoyed the video.
Thanks for watching.