Learning Library

← Back to Library

Data Fabric: Unifying Enterprise Data

Key Points

  • The data fabric is an architectural approach that breaks down silos and lets users access, ingest, integrate, and share data across on‑premises and multiple cloud environments in a governed way, minimizing the need for heavy data movement.
  • Traditional tools —cloud/enterprise data warehouses, data lakes, and the newer lakehouses — act as central repositories, but they often require copying data, which can cause governance challenges, quality issues, and proliferating data silos.
  • Lakehouses combine the scalability and flexibility of data lakes with the organized, high‑quality aspects of data warehouses, enabling both critical operational workloads and advanced analytics or machine‑learning use cases.
  • While a data mesh focuses more on organizational and domain‑centric changes, many of its technical components overlap with a data fabric, making the fabric a practical focal point for a unified, enterprise‑wide data strategy.

Full Transcript

# Data Fabric: Unifying Enterprise Data **Source:** [https://www.youtube.com/watch?v=0Zzn4eVbqfk](https://www.youtube.com/watch?v=0Zzn4eVbqfk) **Duration:** 00:13:22 ## Summary - The data fabric is an architectural approach that breaks down silos and lets users access, ingest, integrate, and share data across on‑premises and multiple cloud environments in a governed way, minimizing the need for heavy data movement. - Traditional tools —cloud/enterprise data warehouses, data lakes, and the newer lakehouses — act as central repositories, but they often require copying data, which can cause governance challenges, quality issues, and proliferating data silos. - Lakehouses combine the scalability and flexibility of data lakes with the organized, high‑quality aspects of data warehouses, enabling both critical operational workloads and advanced analytics or machine‑learning use cases. - While a data mesh focuses more on organizational and domain‑centric changes, many of its technical components overlap with a data fabric, making the fabric a practical focal point for a unified, enterprise‑wide data strategy. ## Sections - [00:00:00](https://www.youtube.com/watch?v=0Zzn4eVbqfk&t=0s) **Understanding Data Fabric Terminology** - The speaker outlines data fabric concepts and differentiates related tools like cloud warehouses, data lakes, and lakehouses. - [00:03:11](https://www.youtube.com/watch?v=0Zzn4eVbqfk&t=191s) **Data Fabric: Virtualized Data Access** - The speaker explains that a data fabric employs a virtualization layer to grant unified, copy‑free access to diverse enterprise data sources—such as warehouses, lakes, relational databases, and many SaaS applications—while also offering robust ETL tools for cases where data must be moved or replicated for performance or pipeline needs. - [00:06:18](https://www.youtube.com/watch?v=0Zzn4eVbqfk&t=378s) **Data Fabric: Lineage, Compliance, & Access** - The speaker outlines how a data fabric delivers rich data lineage, enforces global regulatory compliance, and exposes governed datasets through catalogs for analysts, scientists, and developers across diverse analytics platforms. - [00:09:37](https://www.youtube.com/watch?v=0Zzn4eVbqfk&t=577s) **Data Fabric Enables Personalized Hospitality** - The speaker explains how integrating diverse data sources—historical warehouse records, social media sentiment, multi‑location reviews, and co‑branded credit‑card information—combined with master data management and governance policies creates a single, trusted customer view that powers tailored hotel experiences. - [00:12:53](https://www.youtube.com/watch?v=0Zzn4eVbqfk&t=773s) **Data Fabric Powers Personalized Experiences** - The speaker explains how data fabric underpins diverse applications to deliver personalized, high-quality experiences and encourages viewers to explore IBM's data fabric solution. ## Full Transcript
0:00Today I want to share with you an approach that you may have been hearing about recently called the data fabric. 0:05So let's start with getting some terminology out of the way, 0:08because I know there is tons and tons of terms out there and they can all kind of start to sound the same. 0:13There are things like the data fabric, which is what we will talk about today. 0:20Then there's also data meshes. 0:26And I'm sure you've also heard of cloud enterprise data warehouses. 0:33There are also data lakes. 0:38And there are even data lakehouses. 0:45So I could go on and on, but I think you get the point. 0:48I think it will be helpful if we start categorizing these into methodologies and tools. 1:01So let's move this to the right categories. 1:04OK, so on the tool side, we have things like a cloud data warehouse or an enterprise data warehouse. 1:11These have traditionally been large central repositories for clean and organized business operational data. 1:18In the past, they were hosted locally in on-premises systems, 1:21but have started more recently, moving into more cloud native managed offerings. 1:27Then we have data lakes, 1:29and these have emerged over the past decade or so as the number of unstructured data sources has exploded, 1:36and they serve as a great place to dump all sorts of raw data into quickly 1:40for cleansing and analysis later. 1:43And then the last one I want to touch on is the data lakehouse. 1:47So I'm sure you can tell it's a combination of the two. 1:50And this one's really emerged over the past few years, 1:53and it combines the flexibility in types of data 1:56and the ability to scale of a data lake 1:59with some of the more organized and high quality data components of a data warehouse. 2:04So it allows you to keep running your critical operational data workloads, 2:08while at the same time starting to explore some of those new analytical and machine learning type use cases. 2:13OK, so these are all great tools for analytics and operational reporting, 2:18but they still mostly require you to copy and move data into their central repositories. 2:24Now a couple of things here. 2:26This can create challenges with governance, 2:29we can have data quality issues and we can proliferate multiple data silos. 2:35So this is where we need to start thinking about a broader data strategy. 2:40OK, so finally, now let's turn our attention to the data fabric. 2:44So what is the data fabric? 2:46Simply speaking, the data fabric is an architectural approach and set of technologies 2:51that allow you to break down data silos and get data into the hands of data users. 2:56It enables accessing, ingesting, integrating and sharing data across an enterprise 3:02in a governed manner, regardless of location, 3:05whether it's in your on-premises systems or in multiple public cloud environments. 3:11There's also the concept of a data mesh, which focuses more on the organizational changes, 3:17but a lot of the components of a data mesh are also in a data fabric and it's what we'll focus on for today. 3:23So let's move our data fabric to the top. 3:26OK, so we'll focus on the data fabric. 3:28So 3 responsibilities I want to touch on for data fabric. 3:32The first is accessing data. 3:40Now, as an enterprise, you have data all over the place, right? 3:44You may have data in the data warehouses like I mentioned earlier. 3:49It may be in data lakes. 3:52And you probably also have a large variety of different relational database systems. 3:59But enterprises today also have on average, about 150 different SaaS applications, 4:06and these all have different unique databases, and a lot of them contain critical customer information. 4:14You need to be able to collect information from all of these sources 4:18without moving or copying a ton of data. 4:21So a data fabric allows you to leverage a virtualization layer. 4:27To aggregate access to these data sources 4:31and start using them without moving or copying it into yet another repository. 4:37So we virtualized data where we can, but sometimes there's good reason to copy data. 4:42Perhaps the application we're building has certain latency requirements 4:46and requires more formal data pipelines. 4:50So for that, data fabric should have robust data integration tools or ETL tools. 5:01So then we can move the data from where it is and clean and load it into the central repository. 5:07OK. 5:08The second piece I want to touch on is managing the life cycle of our data. 5:22Now, this is from two perspectives. 5:24We've got governance and privacy, but then we also have compliance. 5:39OK, so for governance, we need to make sure that the right folks in our organization have access to the right data and nothing more. 5:50And a data fabric uses active metadata to automate a lot of the enforcement of the policies that we define. 6:01So what's in these policies? 6:03We have the ability to mask certain aspects of data sets, 6:08some details we may want to redact 6:12and we want to define these based on a role-based access control method. 6:19Right. 6:20OK, additionally, a data fabric should provide us with rich lineage information. 6:28This tells us where that data came from, what transformations were done on it, and we can start assessing that data for quality. 6:38OK, so on the compliance side, you know, it's no secret that there's all sorts of data regulations around the world that are popping up. 6:46There are things like GDPR, there is CCPA. 6:52And depending on your industry, there are things like HIPAA if you are in health care, 7:00there's FCRA if you're in financial services, 7:04it's getting ever more critical to make sure that we are compliant with these 7:08and data fabric helps us define these compliance policies. 7:13OK. 7:15So the last piece I want to touch on for data fabric is exposing data. 7:26So we want to expose data to our users after it's been connected to, 7:31after all our governance policies have been defined and applied to the data sets 7:35through multiple different enterprise search catalog. 7:44Or multiple catalogs, depending on how we define our business functions. 7:50So the data should flow through to these catalogs and be made available to our users. 7:55In this case, it could be business analysts, 7:59we could have data scientists, or we could have application developers. 8:07And folks like these want to start using different types of tools to build their analysis 8:12so they might use different business intelligence or predictive analytics and machine learning platforms. 8:25So a data fabric should support multiple vendors for these platforms, but it should also support open source technologies. 8:35So under this, we can have things like Python, or Spark, and many, many more. 8:46So our data fabric should support multiple vendors, it should support open source technologies, and it should support our app developers to build custom applications through exposing data from the catalog through different API endpoints. 9:01OK. 9:02The last piece I want to touch on for exposing data is trustworthy AI. 9:13And at a high level, this involves using robust. 9:18MLOps tools to operationalize our machine learning projects, 9:23as well as tools to help monitor bias, fairness, and explainability in our results. 9:38OK, so now I want to touch on an example of where a data fabric can be crucial. 9:44So it's clear to us that customers today are demanding evermore high quality and personalized experiences. 9:50And this is no different in the hospitality industry. 9:53So we all want to walk into a hotel and, you know, we don't want to provide a bunch of redundant information. 10:00We want them to know we're coming. 10:02We want them to know what our room preferences are, 10:04and we want to receive offers that are relevant to us. 10:08So let's talk about what actually goes into making an experience like this possible. 10:16So let's start with our data sources. 10:24So we might have historical customer data in those enterprise data warehouses we mentioned, 10:30but we also might want to pull data from unstructured sources 10:36like sentiment analysis from social media. 10:41We may want to ingest customer review data across multiple different locations that we have. 10:49We may even want to pull credit card data from a co-branded credit card that we have 10:56to understand what the customer's purchasing habits are and buying history has been. 11:03OK. 11:04Once we've got that data, we need to make sure we have robust master data management tools. 11:12This helps us figure out that the customer we're looking at contains all the accurate information 11:18and is the golden record for that customer. 11:23So once we have that, then we can apply our governance policies to it, 11:30like we mentioned earlier, we may want to mask certain data. 11:35So for an example, we may want to mask credit card numbers. 11:42So for most of our analysts, there's probably no need for them to see the actual credit card numbers. 11:47So we need to make sure that sensitive personal information like that has been redacted. 11:54OK, so once we've governed our data sets. 11:57We can then publish them into our catalogs. 12:05So once we publish them into our catalogs, 12:07our developers or business analysts can come in and shop for the data that they're looking for through these catalogs 12:14and start building custom applications. 12:22Such as, you know, they could build a recommendation engine 12:28that recommends specific offers to the customers, depending on their purchasing habits and buying behaviors. 12:36We could build a application for guest services. 12:44An application that would allow the guest services at the hotel to greet the customers in a more personalized way 12:51and provide a more relevant experience. 12:53So these are just two examples but, you know, you can have many different applications in this case. 12:58OK. 13:00So I hope you see how a data fabric is important to each of these areas 13:05when we're trying to build personalized, high quality experiences like this. 13:09And I hope you can see how data fabric can help you achieve some of these end results. 13:15If you'd like to learn more about IBM's data fabric solution, 13:19please feel free to check out some of the links below. 13:21Thank you.