Learning Library

← Back to Library

Open Data Lakehouse: Modern AI Architecture

Key Points

  • Enterprises face exploding data volumes, diverse workloads, and costly, siloed architectures that make traditional data warehouses and data lakes inadequate for modern AI and ML use cases.
  • To scale AI, organizations need to modernize inefficient data architectures, unify access across hybrid‑cloud sources, and accelerate insights with built‑in governance and automation.
  • IBM’s Watson X Data implements an open data lake‑house that adds a shared Iceberg/OpenTable metadata layer on top of cost‑effective storage (Parquet, Avro) to eliminate data duplication and simplify cross‑engine analytics.
  • The platform combines IBM‑enhanced Presto, customized Spark libraries, and optional registration of existing warehouses (e.g., Db2, Netezza) to deliver high‑performance, low‑cost analytics, promising up to 50 % lower spend than classic data warehouses.

Full Transcript

# Open Data Lakehouse: Modern AI Architecture **Source:** [https://www.youtube.com/watch?v=hB6olelYhr0](https://www.youtube.com/watch?v=hB6olelYhr0) **Duration:** 00:05:17 ## Summary - Enterprises face exploding data volumes, diverse workloads, and costly, siloed architectures that make traditional data warehouses and data lakes inadequate for modern AI and ML use cases. - To scale AI, organizations need to modernize inefficient data architectures, unify access across hybrid‑cloud sources, and accelerate insights with built‑in governance and automation. - IBM’s Watson X Data implements an open data lake‑house that adds a shared Iceberg/OpenTable metadata layer on top of cost‑effective storage (Parquet, Avro) to eliminate data duplication and simplify cross‑engine analytics. - The platform combines IBM‑enhanced Presto, customized Spark libraries, and optional registration of existing warehouses (e.g., Db2, Netezza) to deliver high‑performance, low‑cost analytics, promising up to 50 % lower spend than classic data warehouses. ## Sections - [00:00:00](https://www.youtube.com/watch?v=hB6olelYhr0&t=0s) **Untitled Section** - ## Full Transcript
0:00an open data lake house you've heard the 0:02term but what does it mean for modern 0:05data management let's start with how we 0:07got here there are key challenges that 0:09have led to the open data lake house 0:11concept including how the amount types 0:14and cost of managing data has exploded 0:16how data needs to be managed differently 0:18across environments and how consumption 0:21of data has changed with new use cases 0:23driven by Ai and when we observe how 0:27Enterprises are dealing with the 0:28explosion management and consumption of 0:31data we find most leverage multiple 0:34siled and monolithic data warehouses and 0:37data lakes on-prem and in the cloud 0:39warehouses typically offer high 0:41performance for processing terabytes of 0:43structured data for business 0:45intelligence use cases but can quickly 0:48become unsuitable and expensive for data 0:51engineering machine learning and 0:53evolving AI workloads price to 0:55Performance is less than optimal by 0:58running these types of workloads in a 0:59data warehouse 1:00data Lakes offer lower cost solutions 1:03for Big unstructured Data but can often 1:06become data swamps and challenged with 1:09extracting insights in a highly 1:11performant and simplified way these data 1:14management challenges inform three 1:16critical needs for Enterprises to scale 1:19AI first is to modernize ineffective 1:22data architectures and reduce data 1:24warehousing costs 1:26second is to seamlessly access and unify 1:29data across disparate hybrid Cloud 1:31sources 1:33and third is to accelerate time to 1:35Insight with built-in governance and 1:38automation the solution is a data lake 1:40house that helps you optimize for both 1:42cost and performance by leveraging 1:44cost-effective repositories combined 1:47with multiple fit for purpose engines to 1:50power various analytics and AI use cases 1:52Watson X data gets to decisions you can 1:55trust in minutes for up to 50 percent 1:58less than the cost of a data warehouse 2:00so how does it work let's take a look at 2:03how Watson X data is architected Watson 2:06X data starts with your data wherever it 2:09resides it provides the foundation to 2:11connect to your existing data lakes and 2:13databases and combine them with new data 2:16to develop and unlock new use cases and 2:19new insights open data formats such as 2:22parquet and Avro are all supported we 2:25then introduce a consistent and shared 2:27metadata layer which organizes data in 2:30the iceberg OpenTable format this allows 2:33different engines to access and power 2:36all analytic workloads while sharing a 2:39single copy of data what does this mean 2:41we eliminate data silos and duplication 2:44of data while reducing cost and 2:47governance risks the last layer brings 2:49multiple query engines this layer is at 2:52the heart of IBM's open data lake house 2:55Watson X data brings together the best 2:58of IBM and the best of Open Source it is 3:01built on IBM's enhanced Presto engine 3:03and leverages Spark with customized 3:06libraries for fast reliable and 3:08efficient processing at scale 3:11you can also register existing data 3:13warehouse engines such as IBM db2 IBM 3:17natiza performance server or another 3:19data warehouse that supports Iceberg to 3:22Watson X data what could make it better 3:24Watson X data can be deployed as 3:27containerized software or as a fully 3:30managed service in multiple public 3:32clouds 3:33with flexible deployment options you can 3:35access all of your data and maximize 3:37workload coverage across all hybrid 3:40Cloud environments leveraging our 3:43multi-engine architecture organizations 3:45now have the opportunity to reduce the 3:47cost of their data warehouse by up to 50 3:49percent through workload optimization 3:52non-optimal costly Warehouse workloads 3:55can now be shifted to lower cost storage 3:58compute and fit for purpose query 4:00engines that dynamically scale up and 4:03down Watson X data also helps 4:05organizations get value out of 4:07ineffective data Lakes by leveraging a 4:10new generation of SQL engines designed 4:12to bring warehouse-like performance and 4:14functionality over Big Data the 4:17Simplicity of what's the next data 4:19allows users to connect to any storage 4:22and analytics environment in minutes and 4:24start generating trusted insights 4:26finally Watson X data complements your 4:29existing data and AI Investments by 4:32integrating into your organization's 4:34data fabric architecture the use of 4:36Industry standards means IBM can 4:39interoperate seamlessly across multiple 4:41data ecosystems our solution integrates 4:43with both IBM and third-party data 4:46science business intelligence and data 4:49integration tools for example what's the 4:52next data integrates with IBM's 4:53governance capabilities to enable 4:55responsible transparent and explainable 4:58data and AI workflows across the 5:00Enterprise IBM Watson X data is the 5:02foundation to help your organization 5:04scale analytics and accelerate the 5:06adoption of AI thank you if you like 5:08this video and want to see more like it 5:10please like And subscribe if you have 5:12any questions please drop them in the 5:14comments below