Enterprise Data Streaming Architecture Overview
Key Points
- Data is likened to “the new oil,” and harnessing the massive, fast‑moving streams that enterprises generate (e.g., a 737 aircraft produces ~20 TB in an hour) is critical for informed, competitive decision‑making.
- A streaming architecture consists of three core layers: **origin** (the source of continuous data, often paired with a messaging protocol like MQTT), **processor** (where the data is filtered, analyzed, and contextualized), and **destination** (where the refined data is stored or presented for downstream consumers).
- The primary advantage of this architecture is minimizing data staleness—delivering value as quickly as possible, often described as “real‑time” streaming, to enable rapid insight and action.
- The presenter will later provide a deeper technical dive into the underlying mechanisms that power these streaming pipelines.
Sections
- Streaming Data: The New Oil - The speaker introduces data streaming concepts, emphasizing the massive, fast‑moving data generated by enterprises (e.g., a 737 plane) and outlines a three‑part streaming architecture.
- Enriching and Analyzing Streaming Sensor Data - The speaker outlines a three‑step process—filtering high‑velocity sensor streams, adding contextual metadata such as asset and location, and then applying machine‑learning techniques to detect patterns.
- Horizontal Scaling of Data Processing - The speaker describes how a processing engine can horizontally expand across multiple compute nodes—adding processing sections, destinations, or receivers—to handle data spikes, keep up with wire speed, and maximize real‑time value.
Full Transcript
# Enterprise Data Streaming Architecture Overview **Source:** [https://www.youtube.com/watch?v=aBIxpJ1_EyY](https://www.youtube.com/watch?v=aBIxpJ1_EyY) **Duration:** 00:09:23 ## Summary - Data is likened to “the new oil,” and harnessing the massive, fast‑moving streams that enterprises generate (e.g., a 737 aircraft produces ~20 TB in an hour) is critical for informed, competitive decision‑making. - A streaming architecture consists of three core layers: **origin** (the source of continuous data, often paired with a messaging protocol like MQTT), **processor** (where the data is filtered, analyzed, and contextualized), and **destination** (where the refined data is stored or presented for downstream consumers). - The primary advantage of this architecture is minimizing data staleness—delivering value as quickly as possible, often described as “real‑time” streaming, to enable rapid insight and action. - The presenter will later provide a deeper technical dive into the underlying mechanisms that power these streaming pipelines. ## Sections - [00:00:00](https://www.youtube.com/watch?v=aBIxpJ1_EyY&t=0s) **Streaming Data: The New Oil** - The speaker introduces data streaming concepts, emphasizing the massive, fast‑moving data generated by enterprises (e.g., a 737 plane) and outlines a three‑part streaming architecture. - [00:04:37](https://www.youtube.com/watch?v=aBIxpJ1_EyY&t=277s) **Enriching and Analyzing Streaming Sensor Data** - The speaker outlines a three‑step process—filtering high‑velocity sensor streams, adding contextual metadata such as asset and location, and then applying machine‑learning techniques to detect patterns. - [00:07:49](https://www.youtube.com/watch?v=aBIxpJ1_EyY&t=469s) **Horizontal Scaling of Data Processing** - The speaker describes how a processing engine can horizontally expand across multiple compute nodes—adding processing sections, destinations, or receivers—to handle data spikes, keep up with wire speed, and maximize real‑time value. ## Full Transcript
All around us every day is data and in some cases this data is moving really fast and really with a
lot of information. You know, in 2006, uh, a phrase was coined, data is the new oil.
And this mathematician uh really just kind of struck the nail on the head because
uh when you look at the an enterprise and and all the data that it generates and creates,
using that data to make better informed business decisions is absolutely paramount in being
um a you know a leader and an innovator in your uh in that you know given uh uh area of of the
business. So just to give you an example, a 737 plane creates about 20 terabytes of data in just
one hour of use. Now a lot of that information can be fairly benign. Um but imagine if you will um if
if you were faced and tasked with the problem of we need to leverage all of this information, all
of this data and some of it is just voluminous. How do we do it? Well, let's talk about that.
Today we're going to be talking about streaming and data streaming concepts. And I'm going to talk
first about an architecture uh for a streaming uh you know streaming data in an enterprise and then
in a future video we'll do a deeper dive on what really kind of happens underneath the covers. But
first let's get started. Okay. So in a streaming architecture there are essentially three areas.
First of all you have an origin and that's where the data actually comes from. Could be the sensor,
could be a machine itself, could uh could be anything that um produces or emits data and
remember this data is coming all the time and constant. Uh sometimes the origin is paired
with a messaging system for example like MQTT technology uh that allows that telemetry uh to to
get delivered to some other system. So we have an origin. The next thing we have is processor. The
processor is actually a place where in the uh in the overall architecture where we take action and
handle the data and try to um in some cases trim it down but in in many cases we try to understand
what the story is with the data that we're given. Lastly is a destination. The destination
is actually where we're going to land the data in this streaming architecture that can allow people
who are consumers uh down the stream to uh to leverage it and and do it at their own pace. Um
the key value point with a streaming architecture is to avoid the stale. So in a graph you can see
that if I were to plot where value is up here and time is down here our graph for our data
starts about here and then goes off like that. What we do with a streaming architecture is we
try to capitalize on this right here. the ability to maximize our value in the lowest amount of time
and many call that real time but it is essentially streaming. So let's dig in. So from an origin
perspective we're going to take that information we're going to deliver that into a system and so
we're going to just essentially we're just going to call this ingest. We're going to take that data
from the origins and then we're going to take it into our streaming analytics and streaming
platform. The next thing is the processor. So in the processor when you look at the things that
happen really the most typical things are first of all we're going to um potentially filter.
We're then next going to enrich
Then we're going to analyze.
So in these three steps, what we have is an ability to take all this voluminous data at
the speed that it's coming on the wire and we're going to filter it to get rid of things that we're
not interested in. We're going to add context such as where's this data coming from? What machine?
what what location, what is it uh currently in the business, what are the operations,
uh all of that just it doesn't come on the sensor. Actually, when we when the record arrives, most
of the time it's just going to have a time stamp and some rudimentary readings like temperature and
pressure, something of that nature. What we need to add to it is where is this coming from? Give
me some context of what this sensor is actually attached to. This could be a vehicle. It could be
uh your vehicle. And when we're you when we're looking at this, we need to actually
kind of put it into context. Then the next step is we're going to analyze it. This is
where we're going to apply machine learning potential uh potentially uh traditional AI,
maybe generative AI, but we're going to look at this and and try to analyze the readings,
whether they be temperature readings or pressure readings over time and try to find patterns. We'll
try to find the patterns either they're going up, they're going down, wherever we're interested in.
Obviously, if we're looking at um costs and you know, money related things, we want to
u make decisions when it's optimal for us to uh to you know, make a purchase so that the money
the the cost is down. But if we're operating a machine and the temperature is going up,
that might be leading to a failure. So we want to get to that sooner rather than later. So once
we analyze this information, we get context about what's the data telling us, what is happening in
the data. Then lastly, we're going to take and we're going to egress that information
for somebody else that might be interested in another area of the business. But overall,
a streaming uh you know, a streaming architecture contains a way for you to capitalize on maximum
value. So, we're doing this in real time at wire speed as the data is coming across. We're
ingesting it from the origins. We're processing it, filtering, enriching, and analyzing. And then
we're egressing that points of interest. You know, I had a mentor tell me a long time ago
about how companies had become data hoarders. We really, you know, this is a system and an
architecture that could help you prevent, you know, really not be a data hoarder. You can
um avoid caping, you know, hundreds of thousands of records that have the same reading and just
really only persist those records that have the anomaly or have the variant that are points of
interest that could, you know, really impact maybe a maintenance decision or an operations decision.
So we're going to store the things that we're interested in for later use. Now last point,
how does this scale? What does this look like? So in many cases you can have a a processing engine
that can uh you know when we look at an instance of this we can have you know multiple numbers of
engines scaled across different compute so that we're be a you know we're we're scaling
horizontally to take in that amount of data so that we can keep up with the wire speed.
uh in many cases we're not talking about that level of data. We're just sort of able to scale
to points where we have maybe um you know spikes in data and so in that the the actual engine
itself will um scale out and have n number of process sections or n number of destinations or
n number of receivers to take in from our origin data. Either way, we scale to meet the speed of
the data so that we can always keep our eye on the north star, which is maximizing the value
in the real time that the data is emitted. Thanks for watching. I hope this was helpful.