Accelerating Data Quality with IBM DataOps
Key Points
- Companies seeking faster, data‑driven decisions must rely on high‑quality, well‑governed data to be accurate and responsible.
- Data Ops is the coordinated orchestration of people, processes, and technology that delivers trusted, high‑quality data quickly, using continuous discovery, transformation, governance, integration, curation, and cataloging.
- IBM Cloud Pak for Data provides an extensible hybrid data platform that serves as a complete Data Ops toolchain, supporting native IBM and third‑party data sources.
- In the demo, a data engineer creates a DB2 connection, runs an automated discovery scan, and the platform extracts metadata, evaluates quality metrics (e.g., data classes, formats, frequency), and publishes the assets to an enterprise catalog for self‑service consumption.
- Governance is enforced through data classes and data‑protection rules, enabling organizations to define business‑specific attribute categories and apply protection policies to ensure compliant, high‑quality data assets.
Full Transcript
# Accelerating Data Quality with IBM DataOps **Source:** [https://www.youtube.com/watch?v=EfqaM2LwTAo](https://www.youtube.com/watch?v=EfqaM2LwTAo) **Duration:** 00:07:06 ## Summary - Companies seeking faster, data‑driven decisions must rely on high‑quality, well‑governed data to be accurate and responsible. - Data Ops is the coordinated orchestration of people, processes, and technology that delivers trusted, high‑quality data quickly, using continuous discovery, transformation, governance, integration, curation, and cataloging. - IBM Cloud Pak for Data provides an extensible hybrid data platform that serves as a complete Data Ops toolchain, supporting native IBM and third‑party data sources. - In the demo, a data engineer creates a DB2 connection, runs an automated discovery scan, and the platform extracts metadata, evaluates quality metrics (e.g., data classes, formats, frequency), and publishes the assets to an enterprise catalog for self‑service consumption. - Governance is enforced through data classes and data‑protection rules, enabling organizations to define business‑specific attribute categories and apply protection policies to ensure compliant, high‑quality data assets. ## Sections - [00:00:00](https://www.youtube.com/watch?v=EfqaM2LwTAo&t=0s) **Delivering Trusted Data with DataOps** - The speaker explains IBM’s DataOps methodology, using IBM Cloud Pak for Data to orchestrate people, processes, and technology for fast, governed, high‑quality data that supports data‑driven decision‑making. ## Full Transcript
businesses everywhere are looking for
ways to improve their operational
efficiency by enabling data-driven
decision-making
however to achieve this accurately and
responsibly it requires the use of
high-quality relevant and governed data
hi my name is love auger wall and i'm a
solution engineer for ibm data and ai
and today i'm here to talk to you about
how your organization can deliver
quality data using our data ops
methodology
so let's start with what is data ops
data dataops is the orchestration of
people process and technology to deliver
trusted high quality data to data
citizens fast
the practice is focused on enabling
collaboration across an organization to
drive agility speed and new data
initiatives at scale
from an operational perspective data ops
integrates a continuous process of data
discovery transformation
governance integration and curation and
cataloging for self-service
our unified and extensible hybrid data
platform ibm cloud pack for data
supports all the services that i
mentioned and more as part of a data ops
tool chain
now let's take a look at a quick demo to
see how both a data engineer and a data
consumer in an organization would use
ibm cloud pack for data's data ops
capabilities
okay so first i'm going to log into
cloud pack for data as a data engineer
and start by adding some data sources
i'll go over to the platform connections
tab and click on new connection
now we can see the wide variety of ibm
sources as well as third-party sources
that we can connect to
and i want to connect to a db2 instance
that i have so i'll input in all my
credentials
and click on create connection
okay now that we have our connection
defined let's see how we can discover
data quality information from it so i'll
jump over to the data discovery tab
and now we have the option between quick
scan which analyzes a sample of each
table or file to quickly provide
analysis results
or automated discovery which provides
detailed analysis results of all assets
from the data source and is typically
suitable for a subset of data
so for now i'll run an automated
discovery
i'll select the data connection that i
set up and the parameters that i want
including publishing the data asset to
our enterprise catalog
so that consumers can find this data
and i'll click on discover
okay while this is running the platform
is pulling metadata from our source and
assessing it for quality using metrics
including data classes formats frequency
distribution and more
so now i want to show you what that
analysis looks like i'll open up the
results and we can see a dashboard that
summarizes our data quality and allows
us to dive deeper into the specific
assets and relationships
all right now let's move on to how can
we govern our data assets
so two points i want to touch on here
data classes and data protection rules
classes allow us to specifically define
different types of data attributes based
on our business language so we could
have something like customer number or
email address which we can see is
defined here by specific format
now that we have our classes defined i
will switch over to the data protection
rules
now these allow us to define custom
rules on how we want to manage sensitive
data
so we can see i already have some rules
in here such as email masking which
completely redacts any data field that
matches with the data class email
i could also substitute it for other
values or i could just obfuscate it
all right now i'm going to switch hats
and log into the platform as a data
consumer
so as a data consumer i'm looking for
specific data that i need to run some
sort of analysis
so i go to my enterprise data catalog
and now i can shop for exactly what i
need by searching a variety of business
terms data classes or other attributes
and i'm able to search across the entire
data landscape does not matter whether
those data sources are hosted on
different clouds or on-prem
everything is in the catalog
okay so now i found the data set that
i'm looking for and i can preview it to
make sure this is what i need
now if you have noticed i can't actually
see all of the data
there are particular columns in here
that are masks for me
now this is because of the data
protection rules that we set up prior
which apply across all data assets in
the catalog
so this allows our organization to
maintain data governance and compliance
protect sensitive data while at the same
time exposing the data that our analysts
and data scientists need
i can also take a quick look at the data
sets profile which allows me to further
analyze it for quality and make sure the
values match a particular class there
are no missing or mismatching values and
review statistics about the asset
okay this looks good to me so i'm going
to add it to my data science project and
use it to start building insights
now once i'm in my project and i decide
i need to make some changes to the data
set i don't actually have to go back to
my data engineer to request those
changes
i can use another self-service tool to
make those changes for my project
so i will open up the data set here and
click into refine
now this allows me to make minor
transformations to my data set in a
point-and-click manner and save it back
to my project
okay so to recap what we did as a data
engineer i was able to leverage the
platform to connect to various data
sources assess them for quality define
specific data protection rules and then
publish the assets to an enterprise
knowledge catalog to make them available
to all our data consumers
and as a data consumer i was able to use
the same platform to search for the data
that i need and quickly get access to it
for my analytics project
i was able to make the changes to how my
data set behaved without the lengthy
process of having to go back to a data
engineer to request those changes
on one single platform we were able to
fulfill several different activities of
the dataops lifecycle
and take a process that traditionally
can take days
into one that can be completed in hours
while we only looked at a few of the
data ops capabilities today check out
the rest by requesting a no-cost trial
of ibm cloud pack for data in the link
below
thank you for watching if you have any
questions please drop us a line below if
you want to see more videos like this in
the future please like and subscribe and
don't forget if you want to learn more
about ibm cloud pack for data please
check out the links below