Learning Library

← Back to Library

Semantic Layer + LLM for Scalable Queries

5m • Unknown Channel • databases • tutorial • intermediate • Watch on YouTube ↗

Key Points

The speaker highlights the difficulty of reliably answering complex business questions (e.g., “impact of customer satisfaction on sales”) from large, multi‑table databases.
The desired solution must be **scalable**, **accurate**, and **consistent**, delivering the same answer to identical or similar queries.
A naïve approach that pulls whole tables into a large language model fails at scale due to token limits and lack of business‑specific context.
To address scalability, the workflow inserts an intermediate step where the LLM generates targeted SQL queries that retrieve only the necessary data before summarization.
Consistency and correctness are further ensured by adding a **semantic layer** that maps database schemas to business terminology and includes custom calculations, enabling the LLM to interpret data in the organization’s context.

Sections

Full Transcript

# Semantic Layer + LLM for Scalable Queries **Source:** [https://www.youtube.com/watch?v=sfyNLcHHDOM](https://www.youtube.com/watch?v=sfyNLcHHDOM) **Duration:** 00:05:33 ## Summary - The speaker highlights the difficulty of reliably answering complex business questions (e.g., “impact of customer satisfaction on sales”) from large, multi‑table databases. - The desired solution must be **scalable**, **accurate**, and **consistent**, delivering the same answer to identical or similar queries. - A naïve approach that pulls whole tables into a large language model fails at scale due to token limits and lack of business‑specific context. - To address scalability, the workflow inserts an intermediate step where the LLM generates targeted SQL queries that retrieve only the necessary data before summarization. - Consistency and correctness are further ensured by adding a **semantic layer** that maps database schemas to business terminology and includes custom calculations, enabling the LLM to interpret data in the organization’s context. ## Sections - [00:00:00](https://www.youtube.com/watch?v=sfyNLcHHDOM&t=0s) **Semantic Layer + LLM for Consistent Data Queries** - The speaker outlines a structured, scalable method that combines a semantic layer with a large language model to reliably translate complex business questions—like the impact of customer satisfaction on sales—into accurate, repeatable database queries. - [00:03:04](https://www.youtube.com/watch?v=sfyNLcHHDOM&t=184s) **Semantic Layer with Metric Definitions** - The speaker outlines using an LLM to generate focused SQL queries and then improving accuracy by applying a semantic layer that encodes business vocabularies and calculations, complemented by metric definitions that capture KPIs and limit the tables and columns needed for each query. ## Full Transcript

0:00Imagine you are working with lots of data, any new database, 0:05hundreds of tables, several columns, and you being asked the question, 0:08what would the impact of that customer satisfaction on sales last months? 0:13It's a complex question that even though you have access to the data, it's very difficult to pull out the right data. 0:19That's come down to this to this one question and be able to answer the question every single time reliably. 0:26So in this video, I will walk you through a structured approach 0:30where we're going to combine the semantic layer with a large language model, 0:34so you can answer this kind of question consistently and reliably. 0:40Starting with the data source that you have, 0:43let's say it a database or a data warehouse, 0:46and here we have our user who is asking these questions and over there will put that answer. 0:57But before we talk about the approach, let's first hit some goals for the approach. 1:03So we have three goals. 1:04We have the approach to be scalable. 1:07We also want to have accurate and consistent and consistent. 1:12Here means everybody asking the same question, or similar question, 1:15they need to get the same answer. 1:17A very simple approach would be to pull everything from the database and answer the question 1:23and will describe that by would have the tables and columns and select, 1:34we select some of these tables to answer the question. 1:39And then we'll pass it to a query layer 1:44that will pull the data. 1:47Based on the select set of tables. 1:50Once we have the data pulled out from the database, we'll pass it to a summarized task, 2:01which will use a large language model to answer the question, 2:09and then based on that, we'll have the answer. 2:13That approach works nicely if you have a very generic and a small set of data. 2:19It will break if the data is a lot more than a few 2:23thousand of those because you're going to go beyond the token limit for that large language model. 2:29And also in the same time, even if the data is small, 2:32the model doesn't understand the business definitions, the 2:36vocabulary or any of the calculations you have is specific to your business. 2:40So let's first solve the the scalability issue, which is our first goal. 2:45We'll introduce SQL step in the middle. 2:53In this step, and instead of loading everything from the database, we will rely on 3:00the large language model using the set of tables to generate sequence. 3:06So this task will call the LLM using that set of tables. 3:11Get back how we find a SQL that will scope down 3:15just write what we need to answer the question. 3:18Then pass that through there to the 3:20query summarize it and get the answer. 3:24That's great. 3:24It will scale up to any amount of data you have, 3:27but we didn't solve the other goals yet. 3:30Now let's focus on the accuracy and the consistency 3:36for those who will introduce the semantic layer. 3:41Think of the semantic layer as an artifact you create 3:45to describe your tables columns using your business definitions and vocabulary. 3:52It will also include any calculations like how do you calculate the cost of goods, 3:56or how do you calculate the sales over whatever amount of period of time. 4:03Using that, you will be able to teach the model how to answer questions well in a way that's relevant to your business. 4:11But that's not enough. 4:12Semantic layers don't really capture KPIs, 4:15like how they track the revenue of its target or the projected values for for evidence, 4:21or when you track sales, you calculate sales over a month, a quarter or a year. 4:28So that's where we introduce the metric definition. 4:34Alongside with the benefit I just mentioned about the definition. 4:37It will also scope down the amount of columns and tables we are talking about 4:42because they will the meeting definition will only include those that are relevant to the KPI that you are interested in. 4:51So by using the semantic layer and the metric definition, 4:56we can pass specifically what's required to answer the question to the large language model. Get the relevant query. 5:05That's that now is a lot more relevant to our question and our business and get the answer. 5:10In the same time we think definitions when the above list to a centralized metric definition catalog, 5:18that catalog will make them available to all of your users in the business and hence it will add to the consistency. 5:26So now we have added the scalability, the accuracy, and the consistency by combining those techniques together.