Serverless Technology for Big Data Analytics
Key Points
- Traditional big‑data analytics relied on highly‑integrated data warehouses, which excel at efficient query processing but are less flexible.
- Hadoop disrupted this model around 2000 by introducing openness to diverse data formats, analytics libraries, languages, and heterogeneous hardware, gaining rapid industry adoption.
- The rise of cloud computing combined with consumer‑driven “sharing economy” behaviors has created a new form factor for big‑data analytics: serverless, which treats compute resources as a shared team utility.
- While many equate serverless with Function‑as‑a‑Service, it actually represents a broader cloud‑native execution model that abstracts away servers, enabling on‑demand scaling of any workload.
Sections
- Evolution from Data Warehouses to Serverless - Torsten Steinbach outlines the shift from traditional data‑warehouse architectures through Hadoop’s open, flexible model to the emerging serverless approach for big‑data analytics.
- Serverless Data Storage Explained - The speaker clarifies that serverless goes beyond function‑as‑a‑service by including cloud‑native object storage that abstracts disk provisioning, provides durable, scalable access to data, and operates on a pay‑as‑you‑go consumption model.
- Serverless Big Data Tradeoffs - The speaker explains how serverless platforms offer performance, cost, and use‑case‑specific trade‑offs, creating a new form factor for big‑data analytics solutions.
Full Transcript
# Serverless Technology for Big Data Analytics **Source:** [https://www.youtube.com/watch?v=HRfR4dJoKDc](https://www.youtube.com/watch?v=HRfR4dJoKDc) **Duration:** 00:07:02 ## Summary - Traditional big‑data analytics relied on highly‑integrated data warehouses, which excel at efficient query processing but are less flexible. - Hadoop disrupted this model around 2000 by introducing openness to diverse data formats, analytics libraries, languages, and heterogeneous hardware, gaining rapid industry adoption. - The rise of cloud computing combined with consumer‑driven “sharing economy” behaviors has created a new form factor for big‑data analytics: serverless, which treats compute resources as a shared team utility. - While many equate serverless with Function‑as‑a‑Service, it actually represents a broader cloud‑native execution model that abstracts away servers, enabling on‑demand scaling of any workload. ## Sections - [00:00:00](https://www.youtube.com/watch?v=HRfR4dJoKDc&t=0s) **Evolution from Data Warehouses to Serverless** - Torsten Steinbach outlines the shift from traditional data‑warehouse architectures through Hadoop’s open, flexible model to the emerging serverless approach for big‑data analytics. - [00:03:13](https://www.youtube.com/watch?v=HRfR4dJoKDc&t=193s) **Serverless Data Storage Explained** - The speaker clarifies that serverless goes beyond function‑as‑a‑service by including cloud‑native object storage that abstracts disk provisioning, provides durable, scalable access to data, and operates on a pay‑as‑you‑go consumption model. - [00:06:31](https://www.youtube.com/watch?v=HRfR4dJoKDc&t=391s) **Serverless Big Data Tradeoffs** - The speaker explains how serverless platforms offer performance, cost, and use‑case‑specific trade‑offs, creating a new form factor for big‑data analytics solutions. ## Full Transcript
Hello, this is Torsten Steinbach, Architect at IBM
for Data and Analytics in the Cloud,
and today I'm going to talk to you about serverless technology
and how it is applied to big data analytics.
When we look at big data in the past few decades,
we can see that there has been a traditional
form factor of big data systems
that has been used for many decades already and this is
the form factor of a data warehouse.
So, this is a highly integrated system,
highly optimized for handing big data queries, big data analytics
in a very efficient manner.
Nevertheless, we had around the year 2000,
Hadoop coming up and being adopted very rapidly,
and gaining a lot of popularity and is now widely adopted in the industry.
Even though there was already big data analytics, so why is that Hadoop came up?
So, this is because it brought,
in addition to this integrated system, more openness to the table.
More openness, in terms of the type of data that it could handle,
data formats, Bring Your Own Data (BYOD) formats,
the types of analytics, analytics libraries, and languages that can be supported.
And also, the flexibility in terms of the hardware,
the deployment options that you can have.
You can bring your custom hardware, or even have heterogeneous hardware.
So, that's why Hadoop basically gained a lot of traction
and is now widely adopted.
Today, however, we are seeing a trend that basically
results in yet another form factor
of doing big data analytics, and this trend
is driven by actually one thing that is happening
which is era of the rise of cloud.
And another thing that actually goes hand in hand a little bit with the rise of cloud
is the consumption behavior of many people, of end users,
to be more oriented on a sharing economy.
So, people are using more and more just ride shares
instead of just renting a car and not to speak of buying a car just to get around.
Or they are just going with Airbnb to sleep a night somewhere.
So, this consumer behavior is also applied now to a team.
And this term "serverless" is actually exactly this:
serverless is, in fact, the sharing economy for a team.
And it is it is enabled by cloud.
And it is, in fact, the most consequent usage model of cloud - serverless.
And many of you have heard the term serverless,
and probably most of you will associate a thing called "Function as a Service" with serverless.
Many of you may think it's synonymous,
which is not exactly true, but that is what basically many people think of
and Function as a Service is:
I have my code that I need to run, my business logic,
but I don't provision dedicated systems, dedicated hardware,
or not even dedicated software.
I'm just sending it to service and saying, "please run it for me".
Run it for me maybe that many times.
So, how to scale out, it's all done ad hoc.
It's basically hiding the fact that there are servers.
That's why it's called serverless.
Now, as I said, this is what many people think of
when they hear the term serverless,
but serverless is more than just function as a service.
Especially when we now look back again at our domain here
which is data, big data and analytics.
The problem with big data analytics is that we are talking about state.
State has to be kept,
my data has to be kept safely, durably, and reliably.
I need to be able to access it anytime I want it.
And that's what these systems provide.
But now in the cloud we have new options.
We can actually extract the storage of data itself
as a cloud service on its own.
And that's also what's happening on the cloud and there is
basically cloud-native storage of object storage.
And object storage is basically serverless storage because
you do not provision disk volumes, you do not configure disk volumes,
and you just bring your data
and the system figures out how to store it and
how to distribute it, and make it highly available and so on.
It's highly abstracted, you just have a REST API
where you upload and download your data
and you can come with kilobytes of data, going up to terabytes of data,
in the same organizational unit.
And the thing about why it is serverless, is also
that it's a "Pay As You Go" consumption model.
You just don't use it as you go, you also to pay as you go.
Which means you're just paying for the
gigabytes that you're storing at this point, right now.
And if you store less, you will be paying less
in a very elastic, completely seamlessly elastic way.
Now, may we now talk about big data analytics.
It's not just about storage of data, but also how can we
analyze this data and process this data.
And that's exactly what we are now seeing as well, driven by cloud,
we are seeing additional services that are made available
around object storage such as "SQL as a Service",
or also it allows you to run SQL,
basically, on the data in object storage
and just be built for this one SQL,
depending on how big the SQL was in terms of how much data it had to scan,
and you do not pay for database that is provisioned and standing around.
Just a single SQL and that's it.
And there are other things that basically play into this
like, for instance, messaging as a service,
so Kafka as a service,
where you are just paying by the number of messages being processed,
and then eventually stored to the object storage.
So there's a series of these services basically coming up,
and in combination they are providing this new form factor
of a big data and analytics system
that is augmenting and actually complementing
the existing form factors because even though
they are more established and older,
there is still a point for using them.
Because they have their sweet spots in terms of their own
performance characteristics and response time guarantees,
but, on the other side, there are maybe cost effectiveness benefits here.
So, depending on your business model and requirements,
you may use this, or this, or the combination of those things.
So, I hope this helps to put in perspective
how serverless plays into big data analytics,
and how it basically generates a whole new form factor of big data and analytics systems.
Thank you very much.