Learning Library

← Back to Library

Deploy Scalable RAG in Three Steps

2m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

Retrieval‑augmented generation (RAG) delivers the highest ROI for enterprise LLM use, but scaling it requires managing vector stores, embeddings, authentication, and high‑volume data pipelines beyond simple notebooks.
The speaker demonstrates a three‑step setup using IBM watsonx Flows: install the CLI, authenticate with domain and API keys, then ingest and chunk data to create a deployable RAG flow.
watsonx Flows automates core tasks—tokenization, vector retrieval, guardrails, and even hallucination‑metric calculation—so developers can build or modify a full RAG pipeline simply by editing flow steps.
After deploying the flow, an API endpoint is generated that can be integrated into applications, returning query completions along with groundedness and hallucination warnings for enterprise‑grade reliability.

Sections

00:00:00 Scaling RAG with Watsonx Flows - The speaker outlines the complexities of deploying Retrieval‑Augmented Generation at scale and demonstrates a three‑step setup using Watsonx Flows to provision vector databases, manage embeddings, create authenticated APIs, and enforce guardrails automatically.

Full Transcript

# Deploy Scalable RAG in Three Steps **Source:** [https://www.youtube.com/watch?v=LpKGm1jJXv4](https://www.youtube.com/watch?v=LpKGm1jJXv4) **Duration:** 00:02:38 ## Summary - Retrieval‑augmented generation (RAG) delivers the highest ROI for enterprise LLM use, but scaling it requires managing vector stores, embeddings, authentication, and high‑volume data pipelines beyond simple notebooks. - The speaker demonstrates a three‑step setup using IBM watsonx Flows: install the CLI, authenticate with domain and API keys, then ingest and chunk data to create a deployable RAG flow. - watsonx Flows automates core tasks—tokenization, vector retrieval, guardrails, and even hallucination‑metric calculation—so developers can build or modify a full RAG pipeline simply by editing flow steps. - After deploying the flow, an API endpoint is generated that can be integrated into applications, returning query completions along with groundedness and hallucination warnings for enterprise‑grade reliability. ## Sections - [00:00:00](https://www.youtube.com/watch?v=LpKGm1jJXv4&t=0s) **Scaling RAG with Watsonx Flows** - The speaker outlines the complexities of deploying Retrieval‑Augmented Generation at scale and demonstrates a three‑step setup using Watsonx Flows to provision vector databases, manage embeddings, create authenticated APIs, and enforce guardrails automatically. ## Full Transcript

0:00You're into LLM so you probably heard about RAG, right? 0:02Well, I'm going to throw it out there. 0:04It's the best way to get bang for buck when using a LLMs for a business. 0:07But when you're doing it at scale, there's more to think about 0:11than just a Jupyter notebook. 0:12And unlike when you're debugging with the service desk, works on my laptop. 0:16Won't work here. 0:18Standing up vector databases, managing embeddings, creating 0:20authenticated APIs, and hooking your LLM takes work. 0:24And even more so when you're dealing with big data volumes or tons of users. 0:28So what if I told you you could get rig up and running for business in three steps? 0:32What if it handled all the hard stuff like tokenization, retrieval, 0:36but also guardrails? 0:37And what if it calculated hallucination metrics for you automatically? 0:41I'm going to show you how to do it in three steps. 0:43And it begins with every developer's favorite bit installing stuff. 0:47I'm going to be using the watsonx flows engine. 0:49The first goal is to be able to run workflows dash dash version on my MacBook. 0:53If I get back a version number. 0:55We're going to do this. I just need to download 0:57the install from here and install it using this command. 0:59Don't let the commands get you, just do it the same way I use Excel every day. 1:03Copy, paste and pray. 1:05Now if I run workflows dash dash version, I get a version number back. 1:08Side note I can also run workflows. 1:10Dash dash help to see all of the commands available. 1:12Now right watsonx flows I like strangers, 1:15like meeting your coworker on the weekend. 1:17So go to I need to authenticate. 1:20I need to get wxflows to recognize me when I run the "whoami" command. 1:23Run watsonx flows login an explode to kick off the authentication process, 1:27it prompts with the environment domain and admin key. 1:29These are all available from this link. 1:31Once done, if I run watsonx flows, who am I? Again? 1:34I get back the domain environment admin key and API key. 1:37I'm in final goal. 1:39Upload the data and deploy a float. 1:41Once I'm done with this step, I'll have an API endpoint that I can use 1:44first run wxflows init dash dash interactive. 1:47This is going to take me through a wizard to chunk up my data. 1:50It prompts for the data location. 1:51In this case, I've got IBM's annual report in markdown format 1:54as well as some chunking parameters. 1:56Once that's done, I get back three new files. 1:58This is the kicker with watsonx Flows, 2:00I can build an entire RAG or LLM flow just by changing the steps in the flow. 2:05Need a prompt template? 2:06Easy! one hallucination metrics calculated? 2:09Add in the hallucination score step. 2:11Need distance metrics raginfo has that. 2:13I can load the data into the vectors store by writing 2:15wxflows collection deploy, choose the RAG flow by and commenting the flow I want in the terminal file, 2:21and deploy it by running wxflows deploy. 2:24This will return an API endpoint. 2:26I can plug the environment details into my application, 2:29and I've now got an enterprise RAG application up and running. 2:32When a query, we can see the completion and groundedness warnings, 2:35as well as the hallucination metrics and source documents.