Learning Library

← Back to Library

SQL Sandwich Architecture for Cloud Analytics

Key Points

  • The “SQL sandwich” architecture layers a data warehouse between two object‑storage tiers: raw data landing at the top and archived, cold data at the bottom.
  • Raw logs, IoT streams, and other inexpensive, elastic storage reside in the upper object store, where they are explored, cleansed, and batch‑processed before entering the warehouse.
  • The central data warehouse holds high‑quality, curated data for interactive analytics that demand low‑latency SLAs, despite its higher operational cost.
  • Older, less‑frequently accessed data is moved to the lower object‑storage tier to reduce costs while still supporting compliance and long‑term reporting needs.
  • An ETL (or ELT) pipeline acts as the “sauce,” extracting data from the landing store, transforming it, and loading it into the warehouse, then offloading stale data back to archive.

Full Transcript

# SQL Sandwich Architecture for Cloud Analytics **Source:** [https://www.youtube.com/watch?v=Ausqifk0ZiM](https://www.youtube.com/watch?v=Ausqifk0ZiM) **Duration:** 00:06:32 ## Summary - The “SQL sandwich” architecture layers a data warehouse between two object‑storage tiers: raw data landing at the top and archived, cold data at the bottom. - Raw logs, IoT streams, and other inexpensive, elastic storage reside in the upper object store, where they are explored, cleansed, and batch‑processed before entering the warehouse. - The central data warehouse holds high‑quality, curated data for interactive analytics that demand low‑latency SLAs, despite its higher operational cost. - Older, less‑frequently accessed data is moved to the lower object‑storage tier to reduce costs while still supporting compliance and long‑term reporting needs. - An ETL (or ELT) pipeline acts as the “sauce,” extracting data from the landing store, transforming it, and loading it into the warehouse, then offloading stale data back to archive. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Ausqifk0ZiM&t=0s) **SQL Sandwich Architecture Explained** - Torsten Steinbach outlines a cloud analytics pattern that sandwiches a data warehouse between raw‑data object storage on top and archive object storage below, enabling cost‑effective ingestion, high‑quality processing, and long‑term retention. - [00:03:18](https://www.youtube.com/watch?v=Ausqifk0ZiM&t=198s) **Data Archiving and Lifecycle Management** - The speaker explains moving cold data from costly warehouses to cheap object storage via ETL, using a sandwich analogy to illustrate how archived data still supports compliance and other analytics while being managed as part of an overall lifecycle. - [00:06:31](https://www.youtube.com/watch?v=Ausqifk0ZiM&t=391s) **Brief Expression of Thanks** - The speaker offers a concise statement of gratitude. ## Full Transcript
0:00Hello, this is Torsten Steinbach, Architect at IBM for Data and Analytics in the Cloud 0:05and I'm going to talk to you today about the SQL sandwich. 0:10This is actually about databases and object storage. 0:14So, as you can see in the image, we're putting it together as a nice and tasty sandwich. 0:21So, at the center of the sandwich we have our data warehouse. 0:28So, this is a very, well established mechanism of storing and analyzing data, big data but 0:38we're surrounding it now with more recent technology and it is especially popular in 0:44the cloud and object storage. 0:53We have it as the bonds basically for a sandwich at the top and at the bottom. 1:02Now, what's the purpose of having all of these components? 1:10What we have actually at the top is our initial place where we are landing and storing all 1:16of our raw data. 1:18So, this might be things like log messages from applications, IoT messages from devices 1:27that are just coming in and in a pretty raw format, we're able to just store them in the 1:31object storage for very little money in a highly elastic manner. 1:39And in the data warehouse however we want to have high quality data. 1:49Because the data warehouse is a much more sophisticated but also much more expensive 1:55component to operate a host and to purchase. 2:00And at the bottom again, why do we have another object storage at the bottom? 2:04Well, this is where we have our archived data. 2:09Now, all of these components basically makes sense in an end-to-end big data analytics 2:17use case because we have new data arriving and being stored in the object storage, and 2:22here in object storage basically we are going to explore the data. 2:30So, we try to find out what's actually in the data, and we prep the data, we prepare 2:40the data, cleanse it, make it higher quality and more curated and we also conduct to some 2:46extent batch analytics directly at the data that is stored in the object storage. 2:57In a data warehouse however we are basically able to do interactive analytics that requires 3:11certain SLAs for latencies of poor performance response times and so on. 3:18And why are you having this archived data again, well this is basically because you 3:23do not want to keep all of the data for years around in a data warehouse because as I told 3:27you this is the more expensive thing to run with all of these components. 3:30So, it makes sense for the data that is not hot anymore and is not required for your daily 3:36business to archive it off again into an object storage. 3:41So, for instance you can still run things like compliance reports that you're required 3:46to do to be compliant with certain regulations. 3:56So, these are the different types of analytics and some things that you want to do with the 4:03data at the different stages of its life cycle basically. 4:08Of course, a real sandwich should not be this dry so you will have some sauce in there. 4:15So, what is the source basically that glues these things together. 4:19It is basically an ETL mechanism. 4:25It allows you basically to read data from here and transform it so that it can be read 4:31here and same from here to here. 4:35So, basically data is traveling this way. 4:42Now finally you see a sandwich is often something that you do not eat into pieces right, you 4:50do not just eat the patty or just the bun. 4:53You eat it as a whole thing. 4:54So, we consume it as a whole thing and it's also the same analogy that holds here that 5:00while it makes sense to prepare a system that can serve these different types of workloads 5:05in the most efficient way and cost effective way possible, it's of course hard if you always 5:11have to think about do I have to go here, here, or here for this query. 5:15It should be automatically figured out by the system and for that reason we are putting 5:20this all into a nice box, putting it all together. 5:24And we are using federation as a mechanism on top of all of that basically virtualizes 5:32the location of the data depending on its age or state off the pipeline that it is in. 5:40Now finally it is called a SQL sandwich, why is it called a SQL sandwich? 5:44Well because SQL is the essential thing for all of these things that you can see here. 5:49SQL is used for basically doing these analytics, these are SQL queries that we are running 5:57here. 5:58SQL is also used to do detail. 6:05And finally, SQL is also used in order to federate those things together, It's a SQL 6:10federation. 6:11Okay, that's our SQL sandwich and I hope this helps you to put in perspective the roles 6:17of object storage and data warehouses and how these two things can be put together into 6:23this nice and tasty format so that you can get the most out of the technology in combination. 6:31Thank you very much.