Learning Library

← Back to Library

Serverless Technology for Big Data Analytics

7m • Unknown Channel • devops • deep-dive • intermediate • Watch on YouTube ↗

Key Points

Traditional big‑data analytics relied on highly‑integrated data warehouses, which excel at efficient query processing but are less flexible.
Hadoop disrupted this model around 2000 by introducing openness to diverse data formats, analytics libraries, languages, and heterogeneous hardware, gaining rapid industry adoption.
The rise of cloud computing combined with consumer‑driven “sharing economy” behaviors has created a new form factor for big‑data analytics: serverless, which treats compute resources as a shared team utility.
While many equate serverless with Function‑as‑a‑Service, it actually represents a broader cloud‑native execution model that abstracts away servers, enabling on‑demand scaling of any workload.

Sections

Full Transcript

# Serverless Technology for Big Data Analytics **Source:** [https://www.youtube.com/watch?v=HRfR4dJoKDc](https://www.youtube.com/watch?v=HRfR4dJoKDc) **Duration:** 00:07:02 ## Summary - Traditional big‑data analytics relied on highly‑integrated data warehouses, which excel at efficient query processing but are less flexible. - Hadoop disrupted this model around 2000 by introducing openness to diverse data formats, analytics libraries, languages, and heterogeneous hardware, gaining rapid industry adoption. - The rise of cloud computing combined with consumer‑driven “sharing economy” behaviors has created a new form factor for big‑data analytics: serverless, which treats compute resources as a shared team utility. - While many equate serverless with Function‑as‑a‑Service, it actually represents a broader cloud‑native execution model that abstracts away servers, enabling on‑demand scaling of any workload. ## Sections - [00:00:00](https://www.youtube.com/watch?v=HRfR4dJoKDc&t=0s) **Evolution from Data Warehouses to Serverless** - Torsten Steinbach outlines the shift from traditional data‑warehouse architectures through Hadoop’s open, flexible model to the emerging serverless approach for big‑data analytics. - [00:03:13](https://www.youtube.com/watch?v=HRfR4dJoKDc&t=193s) **Serverless Data Storage Explained** - The speaker clarifies that serverless goes beyond function‑as‑a‑service by including cloud‑native object storage that abstracts disk provisioning, provides durable, scalable access to data, and operates on a pay‑as‑you‑go consumption model. - [00:06:31](https://www.youtube.com/watch?v=HRfR4dJoKDc&t=391s) **Serverless Big Data Tradeoffs** - The speaker explains how serverless platforms offer performance, cost, and use‑case‑specific trade‑offs, creating a new form factor for big‑data analytics solutions. ## Full Transcript

0:00Hello, this is Torsten Steinbach, Architect at IBM 0:03for Data and Analytics in the Cloud, 0:05and today I'm going to talk to you about serverless technology 0:09and how it is applied to big data analytics. 0:12When we look at big data in the past few decades, 0:17we can see that there has been a traditional 0:20form factor of big data systems 0:23that has been used for many decades already and this is 0:27the form factor of a data warehouse. 0:31So, this is a highly integrated system, 0:34highly optimized for handing big data queries, big data analytics 0:40in a very efficient manner. 0:43Nevertheless, we had around the year 2000, 0:47Hadoop coming up and being adopted very rapidly, 0:52and gaining a lot of popularity and is now widely adopted in the industry. 0:57Even though there was already big data analytics, so why is that Hadoop came up? 1:01So, this is because it brought, 1:03in addition to this integrated system, more openness to the table. 1:07More openness, in terms of the type of data that it could handle, 1:11data formats, Bring Your Own Data (BYOD) formats, 1:14the types of analytics, analytics libraries, and languages that can be supported. 1:20And also, the flexibility in terms of the hardware, 1:24the deployment options that you can have. 1:27You can bring your custom hardware, or even have heterogeneous hardware. 1:30So, that's why Hadoop basically gained a lot of traction 1:33and is now widely adopted. 1:35Today, however, we are seeing a trend that basically 1:39results in yet another form factor 1:41of doing big data analytics, and this trend 1:44is driven by actually one thing that is happening 1:49which is era of the rise of cloud. 1:53And another thing that actually goes hand in hand a little bit with the rise of cloud 1:58is the consumption behavior of many people, of end users, 2:03to be more oriented on a sharing economy. 2:07So, people are using more and more just ride shares 2:11instead of just renting a car and not to speak of buying a car just to get around. 2:16Or they are just going with Airbnb to sleep a night somewhere. 2:20So, this consumer behavior is also applied now to a team. 2:23And this term "serverless" is actually exactly this: 2:29serverless is, in fact, the sharing economy for a team. 2:36And it is it is enabled by cloud. 2:39And it is, in fact, the most consequent usage model of cloud - serverless. 2:46And many of you have heard the term serverless, 2:50and probably most of you will associate a thing called "Function as a Service" with serverless. 2:56Many of you may think it's synonymous, 2:58which is not exactly true, but that is what basically many people think of 3:02and Function as a Service is: 3:04I have my code that I need to run, my business logic, 3:07but I don't provision dedicated systems, dedicated hardware, 3:11or not even dedicated software. 3:13I'm just sending it to service and saying, "please run it for me". 3:17Run it for me maybe that many times. 3:20So, how to scale out, it's all done ad hoc. 3:23It's basically hiding the fact that there are servers. 3:26That's why it's called serverless. 3:30Now, as I said, this is what many people think of 3:33when they hear the term serverless, 3:35but serverless is more than just function as a service. 3:38Especially when we now look back again at our domain here 3:41which is data, big data and analytics. 3:44The problem with big data analytics is that we are talking about state. 3:48State has to be kept, 3:49my data has to be kept safely, durably, and reliably. 3:55I need to be able to access it anytime I want it. 3:57And that's what these systems provide. 3:59But now in the cloud we have new options. 4:01We can actually extract the storage of data itself 4:05as a cloud service on its own. 4:07And that's also what's happening on the cloud and there is 4:11basically cloud-native storage of object storage. 4:19And object storage is basically serverless storage because 4:23you do not provision disk volumes, you do not configure disk volumes, 4:28and you just bring your data 4:30and the system figures out how to store it and 4:32how to distribute it, and make it highly available and so on. 4:36It's highly abstracted, you just have a REST API 4:40where you upload and download your data 4:43and you can come with kilobytes of data, going up to terabytes of data, 4:47in the same organizational unit. 4:49And the thing about why it is serverless, is also 4:53that it's a "Pay As You Go" consumption model. 4:58You just don't use it as you go, you also to pay as you go. 5:03Which means you're just paying for the 5:06gigabytes that you're storing at this point, right now. 5:08And if you store less, you will be paying less 5:11in a very elastic, completely seamlessly elastic way. 5:16Now, may we now talk about big data analytics. 5:19It's not just about storage of data, but also how can we 5:22analyze this data and process this data. 5:24And that's exactly what we are now seeing as well, driven by cloud, 5:27we are seeing additional services that are made available 5:31around object storage such as "SQL as a Service", 5:36or also it allows you to run SQL, 5:39basically, on the data in object storage 5:42and just be built for this one SQL, 5:44depending on how big the SQL was in terms of how much data it had to scan, 5:49and you do not pay for database that is provisioned and standing around. 5:53Just a single SQL and that's it. 5:56And there are other things that basically play into this 5:58like, for instance, messaging as a service, 6:02so Kafka as a service, 6:04where you are just paying by the number of messages being processed, 6:08and then eventually stored to the object storage. 6:11So there's a series of these services basically coming up, 6:14and in combination they are providing this new form factor 6:17of a big data and analytics system 6:20that is augmenting and actually complementing 6:24the existing form factors because even though 6:27they are more established and older, 6:29there is still a point for using them. 6:31Because they have their sweet spots in terms of their own 6:34performance characteristics and response time guarantees, 6:38but, on the other side, there are maybe cost effectiveness benefits here. 6:43So, depending on your business model and requirements, 6:47you may use this, or this, or the combination of those things. 6:51So, I hope this helps to put in perspective 6:53how serverless plays into big data analytics, 6:56and how it basically generates a whole new form factor of big data and analytics systems. 7:01Thank you very much.