Learning Library

← Back to Library

Enterprise Data Streaming Architecture Overview

9m • Unknown Channel • devops • tutorial • beginner • Watch on YouTube ↗

Key Points

Data is likened to “the new oil,” and harnessing the massive, fast‑moving streams that enterprises generate (e.g., a 737 aircraft produces ~20 TB in an hour) is critical for informed, competitive decision‑making.
A streaming architecture consists of three core layers: **origin** (the source of continuous data, often paired with a messaging protocol like MQTT), **processor** (where the data is filtered, analyzed, and contextualized), and **destination** (where the refined data is stored or presented for downstream consumers).
The primary advantage of this architecture is minimizing data staleness—delivering value as quickly as possible, often described as “real‑time” streaming, to enable rapid insight and action.
The presenter will later provide a deeper technical dive into the underlying mechanisms that power these streaming pipelines.

Sections

Full Transcript

# Enterprise Data Streaming Architecture Overview **Source:** [https://www.youtube.com/watch?v=aBIxpJ1_EyY](https://www.youtube.com/watch?v=aBIxpJ1_EyY) **Duration:** 00:09:23 ## Summary - Data is likened to “the new oil,” and harnessing the massive, fast‑moving streams that enterprises generate (e.g., a 737 aircraft produces ~20 TB in an hour) is critical for informed, competitive decision‑making. - A streaming architecture consists of three core layers: **origin** (the source of continuous data, often paired with a messaging protocol like MQTT), **processor** (where the data is filtered, analyzed, and contextualized), and **destination** (where the refined data is stored or presented for downstream consumers). - The primary advantage of this architecture is minimizing data staleness—delivering value as quickly as possible, often described as “real‑time” streaming, to enable rapid insight and action. - The presenter will later provide a deeper technical dive into the underlying mechanisms that power these streaming pipelines. ## Sections - [00:00:00](https://www.youtube.com/watch?v=aBIxpJ1_EyY&t=0s) **Streaming Data: The New Oil** - The speaker introduces data streaming concepts, emphasizing the massive, fast‑moving data generated by enterprises (e.g., a 737 plane) and outlines a three‑part streaming architecture. - [00:04:37](https://www.youtube.com/watch?v=aBIxpJ1_EyY&t=277s) **Enriching and Analyzing Streaming Sensor Data** - The speaker outlines a three‑step process—filtering high‑velocity sensor streams, adding contextual metadata such as asset and location, and then applying machine‑learning techniques to detect patterns. - [00:07:49](https://www.youtube.com/watch?v=aBIxpJ1_EyY&t=469s) **Horizontal Scaling of Data Processing** - The speaker describes how a processing engine can horizontally expand across multiple compute nodes—adding processing sections, destinations, or receivers—to handle data spikes, keep up with wire speed, and maximize real‑time value. ## Full Transcript

0:00All around us every day is data and in some cases this data is moving really fast and really with a 0:10lot of information. You know, in 2006, uh, a phrase was coined, data is the new oil. 0:19And this mathematician uh really just kind of struck the nail on the head because 0:26uh when you look at the an enterprise and and all the data that it generates and creates, 0:32using that data to make better informed business decisions is absolutely paramount in being 0:42um a you know a leader and an innovator in your uh in that you know given uh uh area of of the 0:51business. So just to give you an example, a 737 plane creates about 20 terabytes of data in just 1:02one hour of use. Now a lot of that information can be fairly benign. Um but imagine if you will um if 1:14if you were faced and tasked with the problem of we need to leverage all of this information, all 1:21of this data and some of it is just voluminous. How do we do it? Well, let's talk about that. 1:28Today we're going to be talking about streaming and data streaming concepts. And I'm going to talk 1:35first about an architecture uh for a streaming uh you know streaming data in an enterprise and then 1:42in a future video we'll do a deeper dive on what really kind of happens underneath the covers. But 1:49first let's get started. Okay. So in a streaming architecture there are essentially three areas. 1:55First of all you have an origin and that's where the data actually comes from. Could be the sensor, 2:03could be a machine itself, could uh could be anything that um produces or emits data and 2:09remember this data is coming all the time and constant. Uh sometimes the origin is paired 2:15with a messaging system for example like MQTT technology uh that allows that telemetry uh to to 2:23get delivered to some other system. So we have an origin. The next thing we have is processor. The 2:34processor is actually a place where in the uh in the overall architecture where we take action and 2:40handle the data and try to um in some cases trim it down but in in many cases we try to understand 2:48what the story is with the data that we're given. Lastly is a destination. The destination 2:55is actually where we're going to land the data in this streaming architecture that can allow people 3:02who are consumers uh down the stream to uh to leverage it and and do it at their own pace. Um 3:09the key value point with a streaming architecture is to avoid the stale. So in a graph you can see 3:20that if I were to plot where value is up here and time is down here our graph for our data 3:32starts about here and then goes off like that. What we do with a streaming architecture is we 3:41try to capitalize on this right here. the ability to maximize our value in the lowest amount of time 3:52and many call that real time but it is essentially streaming. So let's dig in. So from an origin 4:01perspective we're going to take that information we're going to deliver that into a system and so 4:08we're going to just essentially we're just going to call this ingest. We're going to take that data 4:15from the origins and then we're going to take it into our streaming analytics and streaming 4:19platform. The next thing is the processor. So in the processor when you look at the things that 4:25happen really the most typical things are first of all we're going to um potentially filter. 4:37We're then next going to enrich 4:45Then we're going to analyze. 4:53So in these three steps, what we have is an ability to take all this voluminous data at 5:00the speed that it's coming on the wire and we're going to filter it to get rid of things that we're 5:04not interested in. We're going to add context such as where's this data coming from? What machine? 5:10what what location, what is it uh currently in the business, what are the operations, 5:16uh all of that just it doesn't come on the sensor. Actually, when we when the record arrives, most 5:20of the time it's just going to have a time stamp and some rudimentary readings like temperature and 5:25pressure, something of that nature. What we need to add to it is where is this coming from? Give 5:31me some context of what this sensor is actually attached to. This could be a vehicle. It could be 5:36uh your vehicle. And when we're you when we're looking at this, we need to actually 5:42kind of put it into context. Then the next step is we're going to analyze it. This is 5:46where we're going to apply machine learning potential uh potentially uh traditional AI, 5:51maybe generative AI, but we're going to look at this and and try to analyze the readings, 5:57whether they be temperature readings or pressure readings over time and try to find patterns. We'll 6:03try to find the patterns either they're going up, they're going down, wherever we're interested in. 6:09Obviously, if we're looking at um costs and you know, money related things, we want to 6:16u make decisions when it's optimal for us to uh to you know, make a purchase so that the money 6:22the the cost is down. But if we're operating a machine and the temperature is going up, 6:28that might be leading to a failure. So we want to get to that sooner rather than later. So once 6:34we analyze this information, we get context about what's the data telling us, what is happening in 6:39the data. Then lastly, we're going to take and we're going to egress that information 6:49for somebody else that might be interested in another area of the business. But overall, 6:57a streaming uh you know, a streaming architecture contains a way for you to capitalize on maximum 7:05value. So, we're doing this in real time at wire speed as the data is coming across. We're 7:11ingesting it from the origins. We're processing it, filtering, enriching, and analyzing. And then 7:17we're egressing that points of interest. You know, I had a mentor tell me a long time ago 7:22about how companies had become data hoarders. We really, you know, this is a system and an 7:28architecture that could help you prevent, you know, really not be a data hoarder. You can 7:33um avoid caping, you know, hundreds of thousands of records that have the same reading and just 7:38really only persist those records that have the anomaly or have the variant that are points of 7:44interest that could, you know, really impact maybe a maintenance decision or an operations decision. 7:49So we're going to store the things that we're interested in for later use. Now last point, 7:54how does this scale? What does this look like? So in many cases you can have a a processing engine 8:01that can uh you know when we look at an instance of this we can have you know multiple numbers of 8:11engines scaled across different compute so that we're be a you know we're we're scaling 8:18horizontally to take in that amount of data so that we can keep up with the wire speed. 8:23uh in many cases we're not talking about that level of data. We're just sort of able to scale 8:31to points where we have maybe um you know spikes in data and so in that the the actual engine 8:40itself will um scale out and have n number of process sections or n number of destinations or 8:51n number of receivers to take in from our origin data. Either way, we scale to meet the speed of 9:00the data so that we can always keep our eye on the north star, which is maximizing the value 9:06in the real time that the data is emitted. Thanks for watching. I hope this was helpful.