Learning Library

← Back to Library

Diagnosing and Optimizing Slow SQL Queries

Key Points

  • Slow queries become a critical bottleneck as data volumes grow, so developers, data scientists, engineers, and DBAs must continuously tune SQL for performance and cost control.
  • The first step in fixing a sluggish query is proper diagnosis using the SQL EXPLAIN command to view the detailed execution plan.
  • A common red flag is a large disparity between rows scanned and rows returned, indicating unnecessary data scanning that inflates runtime.
  • Excessive sorting in the plan signals high RAM consumption and should be addressed through index or query redesign.
  • Full‑table scans are another major inefficiency; eliminating them with appropriate indexing or predicate refinement is a top priority before tackling more complex optimizations.

Sections

Full Transcript

# Diagnosing and Optimizing Slow SQL Queries **Source:** [https://www.youtube.com/watch?v=watwW4Hwyyw](https://www.youtube.com/watch?v=watwW4Hwyyw) **Duration:** 00:15:51 ## Summary - Slow queries become a critical bottleneck as data volumes grow, so developers, data scientists, engineers, and DBAs must continuously tune SQL for performance and cost control. - The first step in fixing a sluggish query is proper diagnosis using the SQL EXPLAIN command to view the detailed execution plan. - A common red flag is a large disparity between rows scanned and rows returned, indicating unnecessary data scanning that inflates runtime. - Excessive sorting in the plan signals high RAM consumption and should be addressed through index or query redesign. - Full‑table scans are another major inefficiency; eliminating them with appropriate indexing or predicate refinement is a top priority before tackling more complex optimizations. ## Sections - [00:00:00](https://www.youtube.com/watch?v=watwW4Hwyyw&t=0s) **Standardizing Slow Query Diagnosis** - The speaker outlines a step‑by‑step, priority‑driven process—starting with using EXPLAIN—to accurately diagnose and optimize sluggish SQL queries for better performance and cost control. - [00:03:06](https://www.youtube.com/watch?v=watwW4Hwyyw&t=186s) **Analyzing Query Plans for Optimization** - It outlines how examining query execution plans—showing costs, resource usage, and row counts—lets you pinpoint inefficiencies, beginning with syntax tweaks before moving on to larger architectural changes. - [00:06:17](https://www.youtube.com/watch?v=watwW4Hwyyw&t=377s) **Query Optimization and Index Benefits** - The speaker details how filtering with WHERE clauses reduces scanned rows and then explains that adding an index pre‑sorts data, enabling the database to locate records quickly instead of scanning each row. - [00:09:20](https://www.youtube.com/watch?v=watwW4Hwyyw&t=560s) **Strategic Indexing and Partitioning Guidance** - The speaker advises adding indexes only on columns used in WHERE clauses, balancing performance gains against write overhead, and suggests considering table partitioning for massive datasets when indexing alone isn’t enough. - [00:12:28](https://www.youtube.com/watch?v=watwW4Hwyyw&t=748s) **When to Use Partitioning and Redesign** - The speaker explains that after optimizing queries and indexes, adding partitions can boost performance but demands careful planning and team consensus, and if issues remain, a complete data‑structure redesign may be required. - [00:15:35](https://www.youtube.com/watch?v=watwW4Hwyyw&t=935s) **Optimizing Queries for AI Foundations** - The speaker emphasizes mastering techniques to ensure query performance, low runtime costs, and a solid base for AI development. ## Full Transcript
0:00Slow running queries can be a major bottleneck for any data-driven organization. 0:05And especially as data sets get larger to support AI and automation, this is gonna become increasingly important. 0:12So whether you're a developer, a data scientist, a data engineer, or even a database administrator, 0:18understanding how to tune queries to make sure that they're optimized for 0:22performance is essential to make that we're controlling our runtime costs 0:27and ensuring that we can deliver real-time data insights. 0:31So that's what we're going to go through today, 0:33a method of going through a standardized process to diagnose and optimize our queries to ensure best performance. 0:41So let's walk through a scenario. 0:43You've written a query and you got all the data to return exactly how you want it, 0:48you deployed it, that's great, but unfortunately it's running slowly, 0:52maybe you've gotten feedback from users or members of your team. 0:55So let's go through the steps we need to take to fix that. 0:59And for the sake of time and this video, 1:01we're going to focus on things that are actually within our control and within the scope of the application. 1:06And as we go through fixes, we're going to make sure we basically prioritize them 1:10based on easiest to fix to most complex to actually change. 1:14So first things first, before we even try and solve any problems 1:19or add any optimizations, we want to make that we're diagnosing the problem correctly. 1:25And this is done through a method called explain. 1:28Explain is SQL context. 1:31So you'll see this literally as a explain command that is written. 1:36So you say explain this query, 1:38and from there, you can actually then get a detailed query plan of exactly how your database is going to run that query. 1:47So with our explain method, we can identify a few key red flags. 1:52Probably the thing that's going to come up the most 1:55is showing that your scanned amount of rows 1:58does not equal or is very, very distant from the amount of rows actually returned. 2:05So we can actually see that the amount scanned does not the equal the return. 2:12And this basically means, let's say you're looking for only a couple hundred rows, 2:16Maybe they're still querying through many many millions. 2:19That's what's going to result in that long run time or the inefficiency. 2:23So seeing that in a way is good because that means you have room to improve. 2:28The other method that we really want to look for is basically sorts 2:34and basically with that we're going to see that when we're actually pulling data out in a query 2:40that in order to sort it you have to keep all that memory in RAM. So that's going be very consuming. 2:47That's another red flag that we can tackle with some optimizations. 2:52But if you see that in your query plan, that's another area of optimization. 2:56And then again, the next thing really is a full table scan. 3:00A full table span means that essentially the entire table is being scanned for your query. 3:07So we wanna make sure that we see those can be completely eliminated through some kind of optimization, 3:15and again, I know I was talking about SQL syntax with Explain. 3:21No matter your database, whether it's no SQL or SQL, you're gonna be able to look at a query plan. 3:27And a query planning based on the syntax and the 3:30details within your particular database is always gonna tell you three things. 3:34It's gonna tell all these great functions that it does and then on top of that, for each one, 3:39it's going to break it down by the amount of time it takes, 3:42by the amounts of resources it consumes, is frequently depicted as cost. 3:46and that will be RAM or CPU, and then it'll actually then depict, 3:50hopefully, the number of rows or transactions 3:52that were included in that operation, and you can quickly start 3:56to see where those areas of improvements can really be applied. 4:01So now we can go ahead and start looking at how we can optimize our queries. 4:05And as we go through the structured methodology, 4:08we're going to actually start with things that are probably the most in our control. 4:12And then the later optimizations are going to be more methods that 4:16would probably be a larger team effort or a bigger change to your overall architecture. 4:22So the first thing we're always really going to look at in query optimization 4:27is the query itself, because about 80% of the time, that's why it's running slowly. 4:31So, we're always going to look for... 4:34making sure that our syntax is as optimized as possible. 4:39So one of the things that we want to make sure we do as we're writing a query is that we're filtering as soon as we can. 4:46Because what is a query really? 4:48Is that it's a method of sorting through a database so you have all your different rows. 4:53And most databases, this is quite a bit. 4:56And any kind of query where you're querying just a little bit of information. 5:01you wanna make sure that you're limiting the amount of information that's actually being scanned. 5:06How this is done is really by adding a WHERE clause so that you are only scanning a smaller portion of the data. 5:12So you're not going from one all the way to the bottom, 5:15you're actually just gonna be querying a smaller proportion of the the data, 5:18so making sure you're taking every chance you can 5:21to optimize and filter out any data that isn't relevant to this query is gonna help it run faster. 5:28The other things to look at is always your joins. 5:31Joins are notoriously a problem, where we can find improvements for optimization. 5:37And then we also want to look at once you have your WHERE clauses, how are you actually writing those? 5:44Are they distinct? 5:45We want to make sure we're looking for a set list of things that are clearly defined. 5:50So something to look out for would be an IN clause. 5:53And this basically means that we're looking for something 5:57where something is in a list, and you want to make sure that that list is relatively short, 6:02because once that list becomes very long, you could start to see query inefficiencies. 6:07So by taking a look at your query, you can make those changes, and then you should be rerunning your explain 6:12so that you can actually quantify and see the improvements that these are making. 6:17By adding that WHERE clause and by filtering more, you might see your sorts and your table stands totally be eliminated. 6:23and you should see the value between the amount scanned and returned 6:27get smaller and smaller, so they start to become virtually the same number. 6:31So once we fully optimized our query, the next step is to 6:36decide if we want to add an index for our database to better work with this query. 6:43So what an index is going to do is it's basically going to help the database optimize the sort. 6:49for a particular column so that it can better navigate through when it's doing some kind of lookup or search. 6:56So first, we're going to back up and really think about what an index is. 7:00So in a database, data is basically stored in a completely random order. 7:06The database doesn't know the order. 7:08It's probably the order that it originally was inserted in. 7:11But that's basically how everything is laid out. 7:16And from there, let's say we're looking at a list of countries. 7:20Let's say we want to look up USA. 7:23What we're going to do is we're first going to look in that first row. 7:25Say, is this USA? 7:26Maybe it is, maybe it isn't. 7:29If then we would look at the second row, then the third, then the fourth. 7:33That'll go all the way through until it hits every single row. 7:37And that's a pretty inefficient query, right? 7:39Because it has to check every single record to see if it matches that. 7:43So what an index actually does is it presorts that information. 7:48So that essentially the database knows where things are. 7:53So when it actually doesn't find it, it can help and use that information to drive a more intelligent search. 8:00So let's say we're gonna build an index on country name. 8:03So same use case, we're going to have the data still gonna be listed out. 8:09And the difference here is now it's gonna be sorted in an order that the database can recognize. 8:14And since it's a string, it's going to be from A to Z. 8:18It's gonna an alphabetical order, 8:20and If this was a number, you know, you can make it in ascending or descending order, whatever you want, 8:26but, now when we look up USA. 8:30It's actually going to start in the middle. 8:32And it's going to say, is this USA? 8:33Probably not. 8:34We know USA is probably towards the bottom here. 8:37And because it won't find it, it now knows that it doesn't even need to search this top portion. 8:43It's just going to go about midway through. 8:46It's going look again. 8:47And then it's gonna look again, and it's basically going to hone in on where the value is based on the value that it actually gets. 8:55So this is a much more intelligent search. 8:58With this, it's probably going to find the value of USA within a few lookups, maybe a few dozen. 9:04Whereas with this method, it is going to have to look through every single row pretty unintelligently. 9:10So that's what an index does. 9:13So the reason we don't do this first is because the first thing you have to do here is optimize your query. 9:18Because you can't just put an index on every column. 9:20You're only going to put an an index on the columns that you're searching with. 9:24So these are the columns that are called in your WHERE clauses. 9:27your order buys, your group buys, anything that's your lookup. 9:31And that's where indexes really should be applied. 9:34They're not a silver bullet. 9:35We can't always just keep building indexes because there is an overhead cost associated with it. 9:41Because now every time you write to your database this order has to be maintained. 9:46So there is some backend operations in your database that happen. 9:49So that's why there are some arbitrary rules, like you shouldn't have more than three indexes. 9:54Now, if you have evidence with your explain that by adding an index, 9:58it drastically improves the performance of a query, go ahead and add it. 10:02But this, again, is just part of regular maintenance in a database. 10:06Indexes, just like queries, should be regularly tuned and look for new areas of optimization. 10:13So really, with just these tools, you should see drastic improvement in your query performance. 10:19The next measure you can consider for optimizing your query is by building 10:24partitions on your table. 10:26Now, partitions are much more so a database optimization. 10:33So it's something that probably has to be taken as more of a team effort, a larger conversation. 10:39But when working with a lot of data, sometimes that's what you need to do. 10:43Because even if you've tuned perfectly with your query and your index, 10:47sometimes you just can't sugarcoat querying a billion rows. 10:51You need more optimization 10:52and you need more filters. 10:54So partitioning actually takes care of this more so at the storage level. 10:58So we all, again, like we have our tables. 11:01And in our tables, we have all of our data and all of information. 11:06And so in that table, we actually are going to break it up horizontally is the way to think about it. 11:13So that there are portions of the data that are kept together, 11:17and we're actually going to completely segment them at the store level. 11:21so that the queries can just target one segment without even needing to look into the others. 11:27That's really what a partition is. 11:29So a common example with this is time series. 11:33Let's say you have a use case where there are millions of transactions a day. 11:37Maybe it's a internet of things use case with lots of sensors that are going into a database. 11:43That's gonna result in a lot of different, you know, rows being created or records being generated. 11:49So how can we break that up? 11:50You can essentially partition it by day or even by hour to make sure that you're then breaking up your data into easy pieces. 11:59So let's say like this is one bucket. 12:02That can be your first hour. 12:05You have another bucket. 12:07That's gonna be your second hour. 12:09You know, and this would go on. 12:11And this automatically happens as the data is being inserted, this is a setting you've put in your table at your database design. 12:17So now, if your queries are going to run something like, show me all the records that have happened in the last hour, 12:23you can just search this query, or basically make sure that you're just searching this bucket. 12:29And from there, you're going to know that you've found it, and you never need to even look in any of these other buckets at all. 12:35And this works the same way. 12:37If you ever need to just look at a distinct number of buckets, it doesn't have to be just one. 12:43So again, partitioning is a great tool of basically breaking up when you have really, really large data volumes. 12:49And if queries and indexes aren't doing the trick, partitions could be the next step. 12:54However, it is a major change. 12:56You might have to re-org your tables. 12:58So make sure you work with your team. 13:00But you're going to have enough evidence from Explain and from the optimizations you've already applied 13:05to have a very fruitful conversation, probably with your database administrator, on why partitions need to be built. 13:11and that it could potentially lead to much better performance. 13:15So between queries, indexes, and partitions, you should see great improvement in your query performance. 13:22Now, the last step that really could be taken if you still are seeing query performance issues 13:28would probably be to look into redesigning your data structure itself. 13:33And this is no small undertaking and definitely should be a team decision. 13:40But with all the evidence that you've gathered through this process of optimization, 13:46you should be able to have a very fruitful discussion about how you've taken everything, 13:50you know, every precaution, every optimization available to you, and we're still seeing query issues. 13:56It might be time to actually change how you're storing your data. 13:59So as you're deciding how you can better structure your data, this is a great thing to consider. 14:05mostly think about how data is accessed, 14:09and you want to make sure that data that is frequently accessed together is stored together. 14:14And then you can kind of make that decision. 14:15Do you want to de-norm tables together? 14:17Do you wanna break them up based on relationships? 14:20This is very easy to do if you're probably at the beginning stages of your application development, 14:24but if you've inherited a legacy application or working with something with 14:30many years of historic data, this will be a larger effort. 14:33The other thing to consider would be parallel computing frameworks like Spark or Hadoop. 14:38This is going to add parallelism basically to your queries so that you can query large amounts of data better. 14:45Not a small undertaking. 14:47This will be changing probably your middleware layer or how your data is queried itself. 14:53So both of these are great options, but they do require a large effort from our teams. 14:58So overall... 15:00we have found that through all these different methods, we can certainly improve the performance of our queries. 15:07So we always want to make sure that we're starting with the simple fixes, 15:10what's probably in our control and easiest for us to change 15:13before we go to more complex redesign and our architectural overhauls. 15:18And when we're also thinking about 15:20how are we going to benchmark ourselves, how are monitoring this, we want to leverage the explain method. 15:25so that we're constantly measuring our performance. 15:28And this shouldn't just be done when something goes wrong. 15:31We really should constantly be tuning our queries before things get to that point. 15:35So by mastering all these techniques, 15:37we can make sure that we are all experts 15:40in making sure that our queries are as performant as possible, that we keeping our runtime costs as low as possible. 15:47And that we providing the necessary foundation we're gonna need for AI development.