Learning Library

← Back to Library

Enterprise Data Warehouse Overview

8m • Unknown Channel • databases • tutorial • beginner • Watch on YouTube ↗

Key Points

Luv Aggarwal (IBM Data Platform Solution Engineer) explains that an enterprise data warehouse (EDW) is a purpose‑specific, organized collection of clean business data, distinct from a data lake’s raw dump and a data mart’s domain‑specific subset.
The EDW serves as the organization’s single source of truth, ingesting diverse raw data from transactional systems, relational databases, CRMs, ERPs, supply‑chain feeds, etc., and converting it into high‑quality, analytics‑ready data via ETL processes.
Once loaded, the warehouse enables business analysts, data scientists, and data engineers to perform reporting, BI, predictive analytics, and machine‑learning using built‑in tools or external platforms.
IBM highlights three primary deployment models for data warehouses—on‑premise, cloud‑based, and hybrid—each offering different trade‑offs for scalability, control, and cost.

Sections

Full Transcript

# Enterprise Data Warehouse Overview **Source:** [https://www.youtube.com/watch?v=k4tK2ttdSDg](https://www.youtube.com/watch?v=k4tK2ttdSDg) **Duration:** 00:08:20 ## Summary - Luv Aggarwal (IBM Data Platform Solution Engineer) explains that an enterprise data warehouse (EDW) is a purpose‑specific, organized collection of clean business data, distinct from a data lake’s raw dump and a data mart’s domain‑specific subset. - The EDW serves as the organization’s single source of truth, ingesting diverse raw data from transactional systems, relational databases, CRMs, ERPs, supply‑chain feeds, etc., and converting it into high‑quality, analytics‑ready data via ETL processes. - Once loaded, the warehouse enables business analysts, data scientists, and data engineers to perform reporting, BI, predictive analytics, and machine‑learning using built‑in tools or external platforms. - IBM highlights three primary deployment models for data warehouses—on‑premise, cloud‑based, and hybrid—each offering different trade‑offs for scalability, control, and cost. ## Sections - [00:00:00](https://www.youtube.com/watch?v=k4tK2ttdSDg&t=0s) **Enterprise Data Warehouse Basics** - Luv Aggarwal explains the distinction between data lakes, warehouses, and marts, emphasizing that a data warehouse is a purpose‑specific, organized collection serving as an organization’s single source of truth. - [00:03:32](https://www.youtube.com/watch?v=k4tK2ttdSDg&t=212s) **On-Premise Data Warehouse Options** - The speaker outlines three on‑premises deployment styles—commodity hardware using MPP or SMP, and purpose‑built appliances—highlighting their architectures, benefits like control and performance, and the upfront cost trade‑off. - [00:06:43](https://www.youtube.com/watch?v=k4tK2ttdSDg&t=403s) **Hybrid On-Prem and Cloud Data Warehousing** - The segment explains how combining on‑premises and cloud data warehouses lets enterprises leverage cloud‑born data, support disaster‑recovery, and maintain mission‑critical workloads. ## Full Transcript

0:00Hey, what's up, everyone? My name is Luv Aggarwal and I'm 0:03a Data Platform Solution Engineer for IBM. 0:06Data warehouses. Their prevalence across enterprises has grown significantly 0:10over the past 20+ years. But with multiple modern advancements, 0:15the numerous options out there are now much more complex. 0:19So, let's talk about what an enterprise data warehouse, or "EDW", is. So, first and foremost, 0:25there's often confusion between "data lakes" and "data warehouses" and even "data marts". 0:46So, I like to think of a data warehouse as being more purpose-specific than a data lake. So, 0:52while a data lake is a great place to dump all sorts of raw, structured and unstructured data 0:57in a quick way to clean and organize later, a data warehouse, on the other hand, is a large 1:02collection of organized and clean business data, ready to help an organization make decisions. 1:09And a data mart is like a subset of a data warehouse that's more specific to a 1:14particular business domain. So, for example, you could have a finance data mart. 1:19But for today, let's focus on the data warehouse. 1:22So, we'll get rid of data lakes and data marts, and we'll make this a little bit bigger. 1:22But for today, we'll focus on the data warehouse. So, let's get rid of data lakes and data marts, 1:24and make our data warehouse a little bit bigger. 1:27So, the data warehouse serves as the single source of truth for an organization across multiple 1:32knowledge domains. And data in the warehouse comes from multiple different source systems. 1:43And is transformed from raw data to high quality data, 1:48optimized for analytics via various different ETL, or "Extract, Transform and Load" tools. 1:58So, as I mentioned, data that's in our source systems can be in 2:04different types. It could be transactional systems, it can be relational databases, 2:08and they can cover a wide variety of business domains. 2:12So, the data could cover things like customer data from our CRMs. We could have sales data. 2:22We could have data from our ERP systems. We could even have supply chain data. 2:30And the list goes on and on. Right. 2:34So, once data has been cleaned, transformed and 2:38loaded into our data warehouse, it's now ready for us to expose to our users, 2:45who can then start to take it and do analytics and machine learning on these data sets. 2:52So, who are our users? Our users can be folks like business analysts. We can have data 3:03scientists. We could even have data engineers. And these folks can now start leveraging these data 3:16sets, either using the built-in analytics tools in the data warehouse or using a variety of different 3:25business intelligence or predictive analytics and machine learning platforms. 3:34OK, so now that we know what an enterprise data warehouse is, 3:38let's talk about the different ways in which it can be implemented. 3:42So, three common ways in which a data warehouse can be deployed. 3:46The first way is on-premises. Now, a couple different ways in which an 3:52on-prem data warehouse can be configured, we could have our data warehouse running on 3:59commodity hardware. Now, this could be set up and structured using either MPP, or "Massively 4:08Parallel Processing", architecture where we just add more compute nodes as our workload grows, 4:15or using SMP, or "Symmetric Multi-Processing", architecture where, typically, we have a 4:23tightly coupled, multi-CPU system that shares resources from one common operating system. 4:30Now, the other way is through a purpose-built appliance format. 4:38Now, this is typically an integrated stack of CPU, memory storage software, 4:46all purpose-built and optimized for a data warehouse workload from a single vendor. 4:51So, what are some of the benefits of having an on-prem data warehouse? 4:56Well, first you get to maintain complete control over the entire tech stack, right? 5:03Second, you can leverage your local network speeds and perhaps avoid some bandwidth challenges 5:11typically associated with the cloud. You can also leverage high availability, and we can maintain 5:20strict governance and regulatory compliance, but on the other hand, an on-prem data warehouse does 5:27come with an upfront investment and the need for ongoing support and maintenance. 5:33Now, the other way in which a data warehouse can be deployed 5:36is through a cloud-based data warehouse, where our data warehouse is delivered as 5:43a managed to SaaS offering via the multiple public cloud providers. 5:50So, moving data warehouses to the cloud is the next frontier for a lot of enterprises 5:56and for valid reasons. Some of the benefits include being able to free up resources 6:03to focus on other high value analytics tasks, right, instead of just managing systems. 6:10Another benefit can also be the ability to scale easily. Right, 6:15because we don't have to go out and procure new hardware 6:19and we get to leverage automatic upgrades. Right. Now, on the other hand, oftentimes a cloud-based 6:31data warehouse can take a performance hit due to how it's fine tuned for that specific workload, 6:37and there can be some unanticipated high costs due to how cloud data warehouse is scaled. 6:44OK, the third option is actually a hybrid approach. So, this takes the best of on-prem 6:54and cloud and brings them together. And a lot of enterprises choose to run both their on-prem 6:59and cloud data warehouses in conjunction. And this can be done for a couple of different reasons. 7:05So, one benefit can be that this allows us to explore new use-cases. Right. So as an enterprise, 7:13we may have certain data sources that were born in the cloud. So, it can be 7:18beneficial to start leveraging a cloud data warehouse for analytics against those use-cases 7:24while still maintaining their mission critical workloads on-prem. 7:30Another benefit can be for a disaster recovery and backup scenario. 7:38This is where we would use both our environments in conjunction for DR and backup reasons. 7:44So, if we take a step back, we can see that we've barely started to scratch the surface of 7:49enterprise data warehouses and how they fit into an overall enterprise architecture. But I hope 7:55this video has given us a good idea of how data warehouses fit in and what they're used for. Thank 8:02you. If you have any questions, please drop us a line below. If you want to see more videos like 8:08this in the future, please like and subscribe. And don't forget, if you want to learn more about any 8:13of the IBM data solutions we've discussed today, please feel free to check out the link below.