Data Fabric: Unifying Enterprise Data
Key Points
- The data fabric is an architectural approach that breaks down silos and lets users access, ingest, integrate, and share data across on‑premises and multiple cloud environments in a governed way, minimizing the need for heavy data movement.
- Traditional tools —cloud/enterprise data warehouses, data lakes, and the newer lakehouses — act as central repositories, but they often require copying data, which can cause governance challenges, quality issues, and proliferating data silos.
- Lakehouses combine the scalability and flexibility of data lakes with the organized, high‑quality aspects of data warehouses, enabling both critical operational workloads and advanced analytics or machine‑learning use cases.
- While a data mesh focuses more on organizational and domain‑centric changes, many of its technical components overlap with a data fabric, making the fabric a practical focal point for a unified, enterprise‑wide data strategy.
Sections
- Understanding Data Fabric Terminology - The speaker outlines data fabric concepts and differentiates related tools like cloud warehouses, data lakes, and lakehouses.
- Data Fabric: Virtualized Data Access - The speaker explains that a data fabric employs a virtualization layer to grant unified, copy‑free access to diverse enterprise data sources—such as warehouses, lakes, relational databases, and many SaaS applications—while also offering robust ETL tools for cases where data must be moved or replicated for performance or pipeline needs.
- Data Fabric: Lineage, Compliance, & Access - The speaker outlines how a data fabric delivers rich data lineage, enforces global regulatory compliance, and exposes governed datasets through catalogs for analysts, scientists, and developers across diverse analytics platforms.
- Data Fabric Enables Personalized Hospitality - The speaker explains how integrating diverse data sources—historical warehouse records, social media sentiment, multi‑location reviews, and co‑branded credit‑card information—combined with master data management and governance policies creates a single, trusted customer view that powers tailored hotel experiences.
- Data Fabric Powers Personalized Experiences - The speaker explains how data fabric underpins diverse applications to deliver personalized, high-quality experiences and encourages viewers to explore IBM's data fabric solution.
Full Transcript
# Data Fabric: Unifying Enterprise Data **Source:** [https://www.youtube.com/watch?v=0Zzn4eVbqfk](https://www.youtube.com/watch?v=0Zzn4eVbqfk) **Duration:** 00:13:22 ## Summary - The data fabric is an architectural approach that breaks down silos and lets users access, ingest, integrate, and share data across on‑premises and multiple cloud environments in a governed way, minimizing the need for heavy data movement. - Traditional tools —cloud/enterprise data warehouses, data lakes, and the newer lakehouses — act as central repositories, but they often require copying data, which can cause governance challenges, quality issues, and proliferating data silos. - Lakehouses combine the scalability and flexibility of data lakes with the organized, high‑quality aspects of data warehouses, enabling both critical operational workloads and advanced analytics or machine‑learning use cases. - While a data mesh focuses more on organizational and domain‑centric changes, many of its technical components overlap with a data fabric, making the fabric a practical focal point for a unified, enterprise‑wide data strategy. ## Sections - [00:00:00](https://www.youtube.com/watch?v=0Zzn4eVbqfk&t=0s) **Understanding Data Fabric Terminology** - The speaker outlines data fabric concepts and differentiates related tools like cloud warehouses, data lakes, and lakehouses. - [00:03:11](https://www.youtube.com/watch?v=0Zzn4eVbqfk&t=191s) **Data Fabric: Virtualized Data Access** - The speaker explains that a data fabric employs a virtualization layer to grant unified, copy‑free access to diverse enterprise data sources—such as warehouses, lakes, relational databases, and many SaaS applications—while also offering robust ETL tools for cases where data must be moved or replicated for performance or pipeline needs. - [00:06:18](https://www.youtube.com/watch?v=0Zzn4eVbqfk&t=378s) **Data Fabric: Lineage, Compliance, & Access** - The speaker outlines how a data fabric delivers rich data lineage, enforces global regulatory compliance, and exposes governed datasets through catalogs for analysts, scientists, and developers across diverse analytics platforms. - [00:09:37](https://www.youtube.com/watch?v=0Zzn4eVbqfk&t=577s) **Data Fabric Enables Personalized Hospitality** - The speaker explains how integrating diverse data sources—historical warehouse records, social media sentiment, multi‑location reviews, and co‑branded credit‑card information—combined with master data management and governance policies creates a single, trusted customer view that powers tailored hotel experiences. - [00:12:53](https://www.youtube.com/watch?v=0Zzn4eVbqfk&t=773s) **Data Fabric Powers Personalized Experiences** - The speaker explains how data fabric underpins diverse applications to deliver personalized, high-quality experiences and encourages viewers to explore IBM's data fabric solution. ## Full Transcript
Today I want to share with you an approach that you may have been hearing about recently called the data fabric.
So let's start with getting some terminology out of the way,
because I know there is tons and tons of terms out there and they can all kind of start to sound the same.
There are things like the data fabric, which is what we will talk about today.
Then there's also data meshes.
And I'm sure you've also heard of cloud enterprise data warehouses.
There are also data lakes.
And there are even data lakehouses.
So I could go on and on, but I think you get the point.
I think it will be helpful if we start categorizing these into methodologies and tools.
So let's move this to the right categories.
OK, so on the tool side, we have things like a cloud data warehouse or an enterprise data warehouse.
These have traditionally been large central repositories for clean and organized business operational data.
In the past, they were hosted locally in on-premises systems,
but have started more recently, moving into more cloud native managed offerings.
Then we have data lakes,
and these have emerged over the past decade or so as the number of unstructured data sources has exploded,
and they serve as a great place to dump all sorts of raw data into quickly
for cleansing and analysis later.
And then the last one I want to touch on is the data lakehouse.
So I'm sure you can tell it's a combination of the two.
And this one's really emerged over the past few years,
and it combines the flexibility in types of data
and the ability to scale of a data lake
with some of the more organized and high quality data components of a data warehouse.
So it allows you to keep running your critical operational data workloads,
while at the same time starting to explore some of those new analytical and machine learning type use cases.
OK, so these are all great tools for analytics and operational reporting,
but they still mostly require you to copy and move data into their central repositories.
Now a couple of things here.
This can create challenges with governance,
we can have data quality issues and we can proliferate multiple data silos.
So this is where we need to start thinking about a broader data strategy.
OK, so finally, now let's turn our attention to the data fabric.
So what is the data fabric?
Simply speaking, the data fabric is an architectural approach and set of technologies
that allow you to break down data silos and get data into the hands of data users.
It enables accessing, ingesting, integrating and sharing data across an enterprise
in a governed manner, regardless of location,
whether it's in your on-premises systems or in multiple public cloud environments.
There's also the concept of a data mesh, which focuses more on the organizational changes,
but a lot of the components of a data mesh are also in a data fabric and it's what we'll focus on for today.
So let's move our data fabric to the top.
OK, so we'll focus on the data fabric.
So 3 responsibilities I want to touch on for data fabric.
The first is accessing data.
Now, as an enterprise, you have data all over the place, right?
You may have data in the data warehouses like I mentioned earlier.
It may be in data lakes.
And you probably also have a large variety of different relational database systems.
But enterprises today also have on average, about 150 different SaaS applications,
and these all have different unique databases, and a lot of them contain critical customer information.
You need to be able to collect information from all of these sources
without moving or copying a ton of data.
So a data fabric allows you to leverage a virtualization layer.
To aggregate access to these data sources
and start using them without moving or copying it into yet another repository.
So we virtualized data where we can, but sometimes there's good reason to copy data.
Perhaps the application we're building has certain latency requirements
and requires more formal data pipelines.
So for that, data fabric should have robust data integration tools or ETL tools.
So then we can move the data from where it is and clean and load it into the central repository.
OK.
The second piece I want to touch on is managing the life cycle of our data.
Now, this is from two perspectives.
We've got governance and privacy, but then we also have compliance.
OK, so for governance, we need to make sure that the right folks in our organization have access to the right data and nothing more.
And a data fabric uses active metadata to automate a lot of the enforcement of the policies that we define.
So what's in these policies?
We have the ability to mask certain aspects of data sets,
some details we may want to redact
and we want to define these based on a role-based access control method.
Right.
OK, additionally, a data fabric should provide us with rich lineage information.
This tells us where that data came from, what transformations were done on it, and we can start assessing that data for quality.
OK, so on the compliance side, you know, it's no secret that there's all sorts of data regulations around the world that are popping up.
There are things like GDPR, there is CCPA.
And depending on your industry, there are things like HIPAA if you are in health care,
there's FCRA if you're in financial services,
it's getting ever more critical to make sure that we are compliant with these
and data fabric helps us define these compliance policies.
OK.
So the last piece I want to touch on for data fabric is exposing data.
So we want to expose data to our users after it's been connected to,
after all our governance policies have been defined and applied to the data sets
through multiple different enterprise search catalog.
Or multiple catalogs, depending on how we define our business functions.
So the data should flow through to these catalogs and be made available to our users.
In this case, it could be business analysts,
we could have data scientists, or we could have application developers.
And folks like these want to start using different types of tools to build their analysis
so they might use different business intelligence or predictive analytics and machine learning platforms.
So a data fabric should support multiple vendors for these platforms, but it should also support open source technologies.
So under this, we can have things like Python, or Spark, and many, many more.
So our data fabric should support multiple vendors, it should support open source technologies, and it should support our app developers to build custom applications through exposing data from the catalog through different API endpoints.
OK.
The last piece I want to touch on for exposing data is trustworthy AI.
And at a high level, this involves using robust.
MLOps tools to operationalize our machine learning projects,
as well as tools to help monitor bias, fairness, and explainability in our results.
OK, so now I want to touch on an example of where a data fabric can be crucial.
So it's clear to us that customers today are demanding evermore high quality and personalized experiences.
And this is no different in the hospitality industry.
So we all want to walk into a hotel and, you know, we don't want to provide a bunch of redundant information.
We want them to know we're coming.
We want them to know what our room preferences are,
and we want to receive offers that are relevant to us.
So let's talk about what actually goes into making an experience like this possible.
So let's start with our data sources.
So we might have historical customer data in those enterprise data warehouses we mentioned,
but we also might want to pull data from unstructured sources
like sentiment analysis from social media.
We may want to ingest customer review data across multiple different locations that we have.
We may even want to pull credit card data from a co-branded credit card that we have
to understand what the customer's purchasing habits are and buying history has been.
OK.
Once we've got that data, we need to make sure we have robust master data management tools.
This helps us figure out that the customer we're looking at contains all the accurate information
and is the golden record for that customer.
So once we have that, then we can apply our governance policies to it,
like we mentioned earlier, we may want to mask certain data.
So for an example, we may want to mask credit card numbers.
So for most of our analysts, there's probably no need for them to see the actual credit card numbers.
So we need to make sure that sensitive personal information like that has been redacted.
OK, so once we've governed our data sets.
We can then publish them into our catalogs.
So once we publish them into our catalogs,
our developers or business analysts can come in and shop for the data that they're looking for through these catalogs
and start building custom applications.
Such as, you know, they could build a recommendation engine
that recommends specific offers to the customers, depending on their purchasing habits and buying behaviors.
We could build a application for guest services.
An application that would allow the guest services at the hotel to greet the customers in a more personalized way
and provide a more relevant experience.
So these are just two examples but, you know, you can have many different applications in this case.
OK.
So I hope you see how a data fabric is important to each of these areas
when we're trying to build personalized, high quality experiences like this.
And I hope you can see how data fabric can help you achieve some of these end results.
If you'd like to learn more about IBM's data fabric solution,
please feel free to check out some of the links below.
Thank you.