Learning Library

← Back to Library

Open Data Lakehouse: Modern AI Architecture

5m • Unknown Channel • databases • deep-dive • intermediate • Watch on YouTube ↗

Key Points

Enterprises face exploding data volumes, diverse workloads, and costly, siloed architectures that make traditional data warehouses and data lakes inadequate for modern AI and ML use cases.
To scale AI, organizations need to modernize inefficient data architectures, unify access across hybrid‑cloud sources, and accelerate insights with built‑in governance and automation.
IBM’s Watson X Data implements an open data lake‑house that adds a shared Iceberg/OpenTable metadata layer on top of cost‑effective storage (Parquet, Avro) to eliminate data duplication and simplify cross‑engine analytics.
The platform combines IBM‑enhanced Presto, customized Spark libraries, and optional registration of existing warehouses (e.g., Db2, Netezza) to deliver high‑performance, low‑cost analytics, promising up to 50 % lower spend than classic data warehouses.

Sections

00:00:00 Untitled Section

Full Transcript

# Open Data Lakehouse: Modern AI Architecture **Source:** [https://www.youtube.com/watch?v=hB6olelYhr0](https://www.youtube.com/watch?v=hB6olelYhr0) **Duration:** 00:05:17 ## Summary - Enterprises face exploding data volumes, diverse workloads, and costly, siloed architectures that make traditional data warehouses and data lakes inadequate for modern AI and ML use cases. - To scale AI, organizations need to modernize inefficient data architectures, unify access across hybrid‑cloud sources, and accelerate insights with built‑in governance and automation. - IBM’s Watson X Data implements an open data lake‑house that adds a shared Iceberg/OpenTable metadata layer on top of cost‑effective storage (Parquet, Avro) to eliminate data duplication and simplify cross‑engine analytics. - The platform combines IBM‑enhanced Presto, customized Spark libraries, and optional registration of existing warehouses (e.g., Db2, Netezza) to deliver high‑performance, low‑cost analytics, promising up to 50 % lower spend than classic data warehouses. ## Sections - [00:00:00](https://www.youtube.com/watch?v=hB6olelYhr0&t=0s) **Untitled Section** - ## Full Transcript

0:00an open data lake house you've heard the 0:02term but what does it mean for modern 0:05data management let's start with how we 0:07got here there are key challenges that 0:09have led to the open data lake house 0:11concept including how the amount types 0:14and cost of managing data has exploded 0:16how data needs to be managed differently 0:18across environments and how consumption 0:21of data has changed with new use cases 0:23driven by Ai and when we observe how 0:27Enterprises are dealing with the 0:28explosion management and consumption of 0:31data we find most leverage multiple 0:34siled and monolithic data warehouses and 0:37data lakes on-prem and in the cloud 0:39warehouses typically offer high 0:41performance for processing terabytes of 0:43structured data for business 0:45intelligence use cases but can quickly 0:48become unsuitable and expensive for data 0:51engineering machine learning and 0:53evolving AI workloads price to 0:55Performance is less than optimal by 0:58running these types of workloads in a 0:59data warehouse 1:00data Lakes offer lower cost solutions 1:03for Big unstructured Data but can often 1:06become data swamps and challenged with 1:09extracting insights in a highly 1:11performant and simplified way these data 1:14management challenges inform three 1:16critical needs for Enterprises to scale 1:19AI first is to modernize ineffective 1:22data architectures and reduce data 1:24warehousing costs 1:26second is to seamlessly access and unify 1:29data across disparate hybrid Cloud 1:31sources 1:33and third is to accelerate time to 1:35Insight with built-in governance and 1:38automation the solution is a data lake 1:40house that helps you optimize for both 1:42cost and performance by leveraging 1:44cost-effective repositories combined 1:47with multiple fit for purpose engines to 1:50power various analytics and AI use cases 1:52Watson X data gets to decisions you can 1:55trust in minutes for up to 50 percent 1:58less than the cost of a data warehouse 2:00so how does it work let's take a look at 2:03how Watson X data is architected Watson 2:06X data starts with your data wherever it 2:09resides it provides the foundation to 2:11connect to your existing data lakes and 2:13databases and combine them with new data 2:16to develop and unlock new use cases and 2:19new insights open data formats such as 2:22parquet and Avro are all supported we 2:25then introduce a consistent and shared 2:27metadata layer which organizes data in 2:30the iceberg OpenTable format this allows 2:33different engines to access and power 2:36all analytic workloads while sharing a 2:39single copy of data what does this mean 2:41we eliminate data silos and duplication 2:44of data while reducing cost and 2:47governance risks the last layer brings 2:49multiple query engines this layer is at 2:52the heart of IBM's open data lake house 2:55Watson X data brings together the best 2:58of IBM and the best of Open Source it is 3:01built on IBM's enhanced Presto engine 3:03and leverages Spark with customized 3:06libraries for fast reliable and 3:08efficient processing at scale 3:11you can also register existing data 3:13warehouse engines such as IBM db2 IBM 3:17natiza performance server or another 3:19data warehouse that supports Iceberg to 3:22Watson X data what could make it better 3:24Watson X data can be deployed as 3:27containerized software or as a fully 3:30managed service in multiple public 3:32clouds 3:33with flexible deployment options you can 3:35access all of your data and maximize 3:37workload coverage across all hybrid 3:40Cloud environments leveraging our 3:43multi-engine architecture organizations 3:45now have the opportunity to reduce the 3:47cost of their data warehouse by up to 50 3:49percent through workload optimization 3:52non-optimal costly Warehouse workloads 3:55can now be shifted to lower cost storage 3:58compute and fit for purpose query 4:00engines that dynamically scale up and 4:03down Watson X data also helps 4:05organizations get value out of 4:07ineffective data Lakes by leveraging a 4:10new generation of SQL engines designed 4:12to bring warehouse-like performance and 4:14functionality over Big Data the 4:17Simplicity of what's the next data 4:19allows users to connect to any storage 4:22and analytics environment in minutes and 4:24start generating trusted insights 4:26finally Watson X data complements your 4:29existing data and AI Investments by 4:32integrating into your organization's 4:34data fabric architecture the use of 4:36Industry standards means IBM can 4:39interoperate seamlessly across multiple 4:41data ecosystems our solution integrates 4:43with both IBM and third-party data 4:46science business intelligence and data 4:49integration tools for example what's the 4:52next data integrates with IBM's 4:53governance capabilities to enable 4:55responsible transparent and explainable 4:58data and AI workflows across the 5:00Enterprise IBM Watson X data is the 5:02foundation to help your organization 5:04scale analytics and accelerate the 5:06adoption of AI thank you if you like 5:08this video and want to see more like it 5:10please like And subscribe if you have 5:12any questions please drop them in the 5:14comments below