Learning Library

← Back to Library

Accelerating Data Quality with IBM DataOps

Key Points

  • Companies seeking faster, data‑driven decisions must rely on high‑quality, well‑governed data to be accurate and responsible.
  • Data Ops is the coordinated orchestration of people, processes, and technology that delivers trusted, high‑quality data quickly, using continuous discovery, transformation, governance, integration, curation, and cataloging.
  • IBM Cloud Pak for Data provides an extensible hybrid data platform that serves as a complete Data Ops toolchain, supporting native IBM and third‑party data sources.
  • In the demo, a data engineer creates a DB2 connection, runs an automated discovery scan, and the platform extracts metadata, evaluates quality metrics (e.g., data classes, formats, frequency), and publishes the assets to an enterprise catalog for self‑service consumption.
  • Governance is enforced through data classes and data‑protection rules, enabling organizations to define business‑specific attribute categories and apply protection policies to ensure compliant, high‑quality data assets.

Full Transcript

# Accelerating Data Quality with IBM DataOps **Source:** [https://www.youtube.com/watch?v=EfqaM2LwTAo](https://www.youtube.com/watch?v=EfqaM2LwTAo) **Duration:** 00:07:06 ## Summary - Companies seeking faster, data‑driven decisions must rely on high‑quality, well‑governed data to be accurate and responsible. - Data Ops is the coordinated orchestration of people, processes, and technology that delivers trusted, high‑quality data quickly, using continuous discovery, transformation, governance, integration, curation, and cataloging. - IBM Cloud Pak for Data provides an extensible hybrid data platform that serves as a complete Data Ops toolchain, supporting native IBM and third‑party data sources. - In the demo, a data engineer creates a DB2 connection, runs an automated discovery scan, and the platform extracts metadata, evaluates quality metrics (e.g., data classes, formats, frequency), and publishes the assets to an enterprise catalog for self‑service consumption. - Governance is enforced through data classes and data‑protection rules, enabling organizations to define business‑specific attribute categories and apply protection policies to ensure compliant, high‑quality data assets. ## Sections - [00:00:00](https://www.youtube.com/watch?v=EfqaM2LwTAo&t=0s) **Delivering Trusted Data with DataOps** - The speaker explains IBM’s DataOps methodology, using IBM Cloud Pak for Data to orchestrate people, processes, and technology for fast, governed, high‑quality data that supports data‑driven decision‑making. ## Full Transcript
0:00businesses everywhere are looking for 0:02ways to improve their operational 0:03efficiency by enabling data-driven 0:05decision-making 0:07however to achieve this accurately and 0:09responsibly it requires the use of 0:11high-quality relevant and governed data 0:14hi my name is love auger wall and i'm a 0:16solution engineer for ibm data and ai 0:18and today i'm here to talk to you about 0:20how your organization can deliver 0:22quality data using our data ops 0:25methodology 0:26so let's start with what is data ops 0:29data dataops is the orchestration of 0:31people process and technology to deliver 0:33trusted high quality data to data 0:36citizens fast 0:37the practice is focused on enabling 0:39collaboration across an organization to 0:42drive agility speed and new data 0:44initiatives at scale 0:46from an operational perspective data ops 0:49integrates a continuous process of data 0:52discovery transformation 0:54governance integration and curation and 0:57cataloging for self-service 0:59our unified and extensible hybrid data 1:01platform ibm cloud pack for data 1:04supports all the services that i 1:05mentioned and more as part of a data ops 1:08tool chain 1:10now let's take a look at a quick demo to 1:12see how both a data engineer and a data 1:14consumer in an organization would use 1:17ibm cloud pack for data's data ops 1:19capabilities 1:20okay so first i'm going to log into 1:22cloud pack for data as a data engineer 1:25and start by adding some data sources 1:27i'll go over to the platform connections 1:30tab and click on new connection 1:33now we can see the wide variety of ibm 1:35sources as well as third-party sources 1:37that we can connect to 1:39and i want to connect to a db2 instance 1:41that i have so i'll input in all my 1:43credentials 1:45and click on create connection 1:47okay now that we have our connection 1:49defined let's see how we can discover 1:51data quality information from it so i'll 1:54jump over to the data discovery tab 1:57and now we have the option between quick 1:59scan which analyzes a sample of each 2:02table or file to quickly provide 2:03analysis results 2:05or automated discovery which provides 2:07detailed analysis results of all assets 2:10from the data source and is typically 2:12suitable for a subset of data 2:14so for now i'll run an automated 2:16discovery 2:17i'll select the data connection that i 2:19set up and the parameters that i want 2:21including publishing the data asset to 2:24our enterprise catalog 2:25so that consumers can find this data 2:28and i'll click on discover 2:31okay while this is running the platform 2:33is pulling metadata from our source and 2:35assessing it for quality using metrics 2:37including data classes formats frequency 2:40distribution and more 2:42so now i want to show you what that 2:44analysis looks like i'll open up the 2:46results and we can see a dashboard that 2:48summarizes our data quality and allows 2:51us to dive deeper into the specific 2:53assets and relationships 2:56all right now let's move on to how can 2:59we govern our data assets 3:01so two points i want to touch on here 3:03data classes and data protection rules 3:07classes allow us to specifically define 3:09different types of data attributes based 3:12on our business language so we could 3:14have something like customer number or 3:16email address which we can see is 3:18defined here by specific format 3:22now that we have our classes defined i 3:24will switch over to the data protection 3:26rules 3:28now these allow us to define custom 3:30rules on how we want to manage sensitive 3:32data 3:33so we can see i already have some rules 3:36in here such as email masking which 3:39completely redacts any data field that 3:41matches with the data class email 3:44i could also substitute it for other 3:46values or i could just obfuscate it 3:50all right now i'm going to switch hats 3:52and log into the platform as a data 3:54consumer 3:55so as a data consumer i'm looking for 3:58specific data that i need to run some 4:00sort of analysis 4:02so i go to my enterprise data catalog 4:04and now i can shop for exactly what i 4:07need by searching a variety of business 4:09terms data classes or other attributes 4:12and i'm able to search across the entire 4:14data landscape does not matter whether 4:17those data sources are hosted on 4:18different clouds or on-prem 4:21everything is in the catalog 4:24okay so now i found the data set that 4:27i'm looking for and i can preview it to 4:29make sure this is what i need 4:31now if you have noticed i can't actually 4:33see all of the data 4:35there are particular columns in here 4:37that are masks for me 4:39now this is because of the data 4:40protection rules that we set up prior 4:42which apply across all data assets in 4:44the catalog 4:46so this allows our organization to 4:48maintain data governance and compliance 4:50protect sensitive data while at the same 4:53time exposing the data that our analysts 4:55and data scientists need 4:58i can also take a quick look at the data 5:01sets profile which allows me to further 5:03analyze it for quality and make sure the 5:05values match a particular class there 5:08are no missing or mismatching values and 5:10review statistics about the asset 5:14okay this looks good to me so i'm going 5:17to add it to my data science project and 5:19use it to start building insights 5:22now once i'm in my project and i decide 5:24i need to make some changes to the data 5:26set i don't actually have to go back to 5:29my data engineer to request those 5:30changes 5:31i can use another self-service tool to 5:33make those changes for my project 5:37so i will open up the data set here and 5:40click into refine 5:42now this allows me to make minor 5:43transformations to my data set in a 5:46point-and-click manner and save it back 5:48to my project 5:51okay so to recap what we did as a data 5:53engineer i was able to leverage the 5:55platform to connect to various data 5:57sources assess them for quality define 6:00specific data protection rules and then 6:02publish the assets to an enterprise 6:04knowledge catalog to make them available 6:06to all our data consumers 6:09and as a data consumer i was able to use 6:11the same platform to search for the data 6:13that i need and quickly get access to it 6:15for my analytics project 6:18i was able to make the changes to how my 6:20data set behaved without the lengthy 6:22process of having to go back to a data 6:24engineer to request those changes 6:27on one single platform we were able to 6:29fulfill several different activities of 6:31the dataops lifecycle 6:33and take a process that traditionally 6:34can take days 6:36into one that can be completed in hours 6:39while we only looked at a few of the 6:41data ops capabilities today check out 6:43the rest by requesting a no-cost trial 6:45of ibm cloud pack for data in the link 6:47below 6:50thank you for watching if you have any 6:52questions please drop us a line below if 6:54you want to see more videos like this in 6:56the future please like and subscribe and 6:58don't forget if you want to learn more 7:00about ibm cloud pack for data please 7:02check out the links below