Learning Library

← Back to Library

Deploy Scalable RAG in Three Steps

Key Points

  • Retrieval‑augmented generation (RAG) delivers the highest ROI for enterprise LLM use, but scaling it requires managing vector stores, embeddings, authentication, and high‑volume data pipelines beyond simple notebooks.
  • The speaker demonstrates a three‑step setup using IBM watsonx Flows: install the CLI, authenticate with domain and API keys, then ingest and chunk data to create a deployable RAG flow.
  • watsonx Flows automates core tasks—tokenization, vector retrieval, guardrails, and even hallucination‑metric calculation—so developers can build or modify a full RAG pipeline simply by editing flow steps.
  • After deploying the flow, an API endpoint is generated that can be integrated into applications, returning query completions along with groundedness and hallucination warnings for enterprise‑grade reliability.

Full Transcript

# Deploy Scalable RAG in Three Steps **Source:** [https://www.youtube.com/watch?v=LpKGm1jJXv4](https://www.youtube.com/watch?v=LpKGm1jJXv4) **Duration:** 00:02:38 ## Summary - Retrieval‑augmented generation (RAG) delivers the highest ROI for enterprise LLM use, but scaling it requires managing vector stores, embeddings, authentication, and high‑volume data pipelines beyond simple notebooks. - The speaker demonstrates a three‑step setup using IBM watsonx Flows: install the CLI, authenticate with domain and API keys, then ingest and chunk data to create a deployable RAG flow. - watsonx Flows automates core tasks—tokenization, vector retrieval, guardrails, and even hallucination‑metric calculation—so developers can build or modify a full RAG pipeline simply by editing flow steps. - After deploying the flow, an API endpoint is generated that can be integrated into applications, returning query completions along with groundedness and hallucination warnings for enterprise‑grade reliability. ## Sections - [00:00:00](https://www.youtube.com/watch?v=LpKGm1jJXv4&t=0s) **Scaling RAG with Watsonx Flows** - The speaker outlines the complexities of deploying Retrieval‑Augmented Generation at scale and demonstrates a three‑step setup using Watsonx Flows to provision vector databases, manage embeddings, create authenticated APIs, and enforce guardrails automatically. ## Full Transcript
0:00You're into LLM so you probably heard about RAG, right? 0:02Well, I'm going to throw it out there. 0:04It's the best way to get bang for buck when using a LLMs for a business. 0:07But when you're doing it at scale, there's more to think about 0:11than just a Jupyter notebook. 0:12And unlike when you're debugging with the service desk, works on my laptop. 0:16Won't work here. 0:18Standing up vector databases, managing embeddings, creating 0:20authenticated APIs, and hooking your LLM takes work. 0:24And even more so when you're dealing with big data volumes or tons of users. 0:28So what if I told you you could get rig up and running for business in three steps? 0:32What if it handled all the hard stuff like tokenization, retrieval, 0:36but also guardrails? 0:37And what if it calculated hallucination metrics for you automatically? 0:41I'm going to show you how to do it in three steps. 0:43And it begins with every developer's favorite bit installing stuff. 0:47I'm going to be using the watsonx flows engine. 0:49The first goal is to be able to run workflows dash dash version on my MacBook. 0:53If I get back a version number. 0:55We're going to do this. I just need to download 0:57the install from here and install it using this command. 0:59Don't let the commands get you, just do it the same way I use Excel every day. 1:03Copy, paste and pray. 1:05Now if I run workflows dash dash version, I get a version number back. 1:08Side note I can also run workflows. 1:10Dash dash help to see all of the commands available. 1:12Now right watsonx flows I like strangers, 1:15like meeting your coworker on the weekend. 1:17So go to I need to authenticate. 1:20I need to get wxflows to recognize me when I run the "whoami" command. 1:23Run watsonx flows login an explode to kick off the authentication process, 1:27it prompts with the environment domain and admin key. 1:29These are all available from this link. 1:31Once done, if I run watsonx flows, who am I? Again? 1:34I get back the domain environment admin key and API key. 1:37I'm in final goal. 1:39Upload the data and deploy a float. 1:41Once I'm done with this step, I'll have an API endpoint that I can use 1:44first run wxflows init dash dash interactive. 1:47This is going to take me through a wizard to chunk up my data. 1:50It prompts for the data location. 1:51In this case, I've got IBM's annual report in markdown format 1:54as well as some chunking parameters. 1:56Once that's done, I get back three new files. 1:58This is the kicker with watsonx Flows, 2:00I can build an entire RAG or LLM flow just by changing the steps in the flow. 2:05Need a prompt template? 2:06Easy! one hallucination metrics calculated? 2:09Add in the hallucination score step. 2:11Need distance metrics raginfo has that. 2:13I can load the data into the vectors store by writing 2:15wxflows collection deploy, choose the RAG flow by and commenting the flow I want in the terminal file, 2:21and deploy it by running wxflows deploy. 2:24This will return an API endpoint. 2:26I can plug the environment details into my application, 2:29and I've now got an enterprise RAG application up and running. 2:32When a query, we can see the completion and groundedness warnings, 2:35as well as the hallucination metrics and source documents.