Learning Library

← Back to Library

Observability: Logs, Metrics, Monitoring Explained

8m • Unknown Channel • devops • interview • beginner • Watch on YouTube ↗

Key Points

As applications become more complex, observability—rather than a buzzword—is essential for understanding system behavior, monitoring activity, and troubleshooting issues.
Observability is built on three pillars—logging, metrics, and monitoring—with logging further broken down into OS‑level, platform (e.g., Kubernetes), and application‑level logs that must be well‑structured to yield useful insights.
Different stakeholder personas (developers, operations, and security teams) need tailored views of the massive data flowing from diverse environments such as public clouds, on‑premises infrastructure, and edge devices.
LogDNA, a core component of IBM Cloud’s observability stack, helps filter, aggregate, and present this data so each persona can efficiently extract the information they need.

Sections

Full Transcript

# Observability: Logs, Metrics, Monitoring Explained **Source:** [https://www.youtube.com/watch?v=bvVgP4tw_Hc](https://www.youtube.com/watch?v=bvVgP4tw_Hc) **Duration:** 00:08:18 ## Summary - As applications become more complex, observability—rather than a buzzword—is essential for understanding system behavior, monitoring activity, and troubleshooting issues. - Observability is built on three pillars—logging, metrics, and monitoring—with logging further broken down into OS‑level, platform (e.g., Kubernetes), and application‑level logs that must be well‑structured to yield useful insights. - Different stakeholder personas (developers, operations, and security teams) need tailored views of the massive data flowing from diverse environments such as public clouds, on‑premises infrastructure, and edge devices. - LogDNA, a core component of IBM Cloud’s observability stack, helps filter, aggregate, and present this data so each persona can efficiently extract the information they need. ## Sections - [00:00:00](https://www.youtube.com/watch?v=bvVgP4tw_Hc&t=0s) **Observability Fundamentals: Logs, Metrics, Monitoring** - In this segment, IBM Cloud’s Sai Vennam and LogDNA’s Laura Santamaría introduce observability, define its three tiers—logging, metrics, and monitoring—and explain why it’s essential for managing complex applications. - [00:03:28](https://www.youtube.com/watch?v=bvVgP4tw_Hc&t=208s) **Aggregating and Filtering Observability Data** - The passage explains how an aggregator gathers multi‑level telemetry, filters it to deliver only the debugging information developers need, and then externalizes the data so both developers and operations teams can gain actionable insights. ## Full Transcript

0:00As your applications grow in complexity how do you harness and drive new 0:04insights from all the chaos? And is observability just a buzzword, or is it 0:10something that you actually need to think about? Spoiler alert, it is. My name 0:14is Sai Vennam and I'm with the IBM Cloud team, and today I'm joined with a special 0:17guest. Hi there, I'm Laura Santamaría and I am a Developer Advocate with LogDNA. 0:22If you don't know LogDNA is a core part of our observability story on IBM Cloud, 0:27but today we're gonna be talking about observability, so let's start with 0:31definition. So observability is a property of your systems that helps you 0:36understand what's going on with them, monitor what they're doing, and be able 0:40to get the information you need to troubleshoot. So the way we see it 0:44there's three major tears of observability and let's go through those now. 0:47We're gonna start out with my favorite which is logging. In addition to logging 0:53we additionally have metrics, so that's just all of your analytics around all of 0:58the data that you're gathering and finally...we've got monitoring. Now 1:02monitoring is essentially putting up a magnifying glass to your systems and 1:05getting new insights from what's actually running there. Today we're gonna 1:09be starting with an example, 1:11in the bottom left corner we have sketched out 1:13a few of the different infrastructure pieces so we'll start with today. Can we 1:16explain what those are? Sure, we have a public cloud, it can be any of them. And then 1:21you have on-prem, and then let's say we actually have some user data, maybe this 1:26is a tablet or a cell phone. So all of those infrastructure pieces are creating 1:31and generating data and what I'm kind of gonna focus on here is the personas that 1:36are going to consume them. So we've got that Dev persona, 1:43we've got Ops, 1:47and finally we have Security. So, all of this data flowing in is kind of a lot, I 1:57want to have some way of filtering it down for my specific user personas to be 2:02able to understand it. So let's start with developers, 2:06what do developers care about? I actually want to back up here for a moment though 2:11because let's talk about all the different levels that logging can come 2:14from. So we have three different levels that we can think about so you have your 2:20Operating System, you have Kubernetes or any other type of platform, so I'm 2:25picking kubernetes. That's my favorite. And then finally your application. So your 2:31operating system and kubernetes all send really good logs and you can use a lot 2:35of that data pretty much as this, or add in some of your own but applications is 2:41really where you need to spend some time. So you're devs need to create a proper 2:47event stream and this really goes by the garbage in, garbage out system where you 2:52really need to put in good work and get some good data on the side of the 2:56application so that you get good logs out. Right, exactly, so the great 3:00developers out there on kubernetes and the operating Systems they've 3:04instrumented their platforms but the application that's up to you as a 3:07developer to make sure the instrumentation is in place. Absolutely, 3:11and when you think about it, let's say that we have an operating system here 3:17and I'm gonna say that's an operating system, and then we have kubernetes 3:20running on it. And then you actually have your app running on top of kubernetes. 3:28And all of these are to each sending data. So we have three different levels 3:36of data all coming out and trying to come towards the dev that wants some 3:41information. Right, so it looks like they're all coming into this central 3:44area here. That's right. We can talk about this is our aggregator. 3:49So our aggregator takes in all of this data and puts it all into one place so 3:58we can work with it. That's right, but kind of 4:01coming back to the the problem here a developer might not care about all of 4:05the information flowing in, how do we drive just the pieces that they care 4:09about like we mentioned? Maybe they instrumented their specific application, 4:12how do we drive that to them? Absolutely, so an aggregator often has filters. So in 4:19this case let's say the dev is just asking for data about debugging and just 4:26some information there, and your data, your filter can actually set up a 4:30dashboard or some other way of accessing all of that data that the dev can take a 4:35look at just the pieces that they need. That's a core part of a observability 4:40solution, this aggregator not only does it collect the data but it needs to 4:43externalize it, expose it, so my developers can access it and drive new 4:48insights. So let's say we solved that part of the puzzle, 4:52what do operators care about? What are the operations teams? What are they 4:55looking for out of these systems? So an operations team might need to know more 4:59about degradation of its system, or if a pod is falling over, maybe your database 5:04filled up and you need to know more information about how you can fix it. 5:07The ops teams is going to be getting data from all of these different systems 5:11and filtering it out to yet another dashboard or another interface of some 5:18sort and getting that data just what they need. Right, so potentially they may 5:24not care as much about specific application level logs but they'll be 5:28looking to kubernetes to say hey what was the CPU usage, do we need to set up 5:32some horizontal pod auto scalers to make sure that we don't hit those limits. 5:37Finally, kind of probably see where I'm going here with the last piece of the 5:41puzzle with security, they probably have a dashboard that's created for them as 5:45well. So a security team let's say they're using a third-party tool as most 5:49security teams generally, do they identify a threat ID, or maybe a customer 5:54ID and they want to dive in deeper to a potential threat that's been identified. 5:58So they put that information in the aggregator and they can identify and 6:02make kind of sense of all the chaos to identify exactly what that specific 6:06security analyst might be looking for. But I want to pose an interesting 6:10question here, it's not always about going to the system and identifying 6:13what's there, many times security advisors need to 6:17know what's happening the second it happens and they can't just sit there 6:21and stare at logs all day, right. Absolutely, this is where monitoring 6:25comes in, this is really a two-way street. We have automated alerts that can go out 6:30and tell all of these different groups about specific things that they're 6:35interested in, specific events that they want to know about. So let's say that you 6:40have a system that's been accessed and it's not supposed to be frankly that 6:44system is going to figure it out long before a human is and that's what an 6:49alert is for an ops team doesn't want to find out that there's a degradation of 6:53service when their user does, they need to know ahead of time. So a good 6:58observability solution should have the ability to externalize the data and then 7:02additionally set up a learning on top of that. So our dev team may be their most 7:06comfortable in Slack, so they set up a chat bot so that particular exceptions 7:12when they're thrown they're able to know when they happen. Your ops team may be 7:16they were using something like a paging system so that you know in the middle of 7:20the night if something goes down they get alert and they can start looking 7:24into it right away. And then finally for our security teams, kind of as I 7:28mentioned, they're generally using you know maybe third party tools or custom 7:31dashboards they can set up custom alerting so they can know exactly when 7:37something goes down. And to be honest this is your new norm, you're going to 7:43have multiple clouds, you're going to have on-premise systems, you're going to 7:46have data coming directly in from your users. You need to be able to understand 7:51what's going on and really this is what observability is all about. Thanks for 7:57joining us for this quick overview of observability, also thank you so much for 8:00joining us today Laura. Absolutely, my pleasure. If you have any questions 8:04please drop us a line below. If you want to see more videos like this in the 8:07future, please like and subscribe. And don't forget you can always get started 8:12on the cloud at no cost by signing up for a free IBM Cloud account.