Learning Library

← Back to Library

Distinguishing Observability, Monitoring, and APM

Full Transcript

# Distinguishing Observability, Monitoring, and APM **Source:** [https://www.youtube.com/watch?v=CAQ_a2-9UOI](https://www.youtube.com/watch?v=CAQ_a2-9UOI) **Duration:** 00:09:40 ## Sections - [00:00:00](https://www.youtube.com/watch?v=CAQ_a2-9UOI&t=0s) **Differentiating Observability, Monitoring, APM** - The speaker explains how observability, monitoring, and APM differ by illustrating their roles using a legacy Java EE application example. - [00:03:40](https://www.youtube.com/watch?v=CAQ_a2-9UOI&t=220s) **Collect, Monitor, Analyze Observability Workflow** - The speaker outlines a three‑step observability approach—collecting Kubernetes metrics and logs, visualizing them through dashboards for monitoring, and analyzing the data to troubleshoot application bugs. - [00:07:02](https://www.youtube.com/watch?v=CAQ_a2-9UOI&t=422s) **Automation, Context, and Action in Observability** - The speaker emphasizes the need for automated context provisioning during upgrades, tracing issues back to source, and a structured analyze‑and‑fix cycle using modern observability tools. ## Full Transcript
0:00These days I hear the terms  Observability, Monitoring, and APM, 0:04or Application Performance Management  thrown around seemingly interchangeably, 0:08but these terms actually mean quite different  things. So let's dive in head first and see an 0:13example of how exactly these things differ. So to  start I'm going to start with kind of a Java EE 0:19application, it's kind of old school, we'll  go back you know maybe a decade. And let's say 0:25that we've got some components in this Java EE app  that actually power it. So something important to 0:30remember here although we might be using a SOA, or  service oriented architecture, this is not exactly 0:35microservices. So they're not communicating over  Rest APIs. So you have some inherent advantages 0:41here, for example you can take advantage of  like the framework the Java EE framework to 0:46output log files which will probably all come  out into the same directory and the timestamps 0:51match up so things are good. In addition, you  could take advantage of something like an APM 0:55solution which is kind of like a one size fits  all set and forget so you install it and it'll 1:01kind of get rich analytics and data and metrics  about the running services within the application. 1:07So essentially what we've done is we've made  our system observable so that you know our 1:12Ops teams were then able to kind of look into  it and identify problems and figure out you 1:19know if anything needed to be done. So for the  business objectives back then this was essentially 1:24good enough, but this tends to fall apart very  quickly when you start to move to a more cloud 1:30native approach where you have multiple run times  and multiple kind of layers to the architecture. 1:36So let's say we have an example app here. So we'll  say we'll start with node as a front end. Let's 1:42say we also have a Java backend application. And  then finally let's say we also have a Python app 1:50which is doing some data processing. So let's  see how these things work with each other so 1:54the front-end app probably talks to the Java app  and also the Python app for some data processing. 2:00The Java app probably communicates with a database  and then the Python app probably talks to the Java 2:05app for kind of crud operations. So this is kind  of my quick sketch, kind of a dummy layout for a 2:12microservices based application. You can take it a  step further and even say that this is all running 2:18within Kubernetes. So we've got these  container-based applications running in a cluster. 2:25So immediately the first problem I can  see here is that with multiple runtimes 2:28we now have to think about multiple  different agents or ways to collect data. 2:32So instead of just one APM tool we might have  to start thinking about pulling in multiple 2:37so how would we con consolidate all  that data right so that's a challenge. 2:41In addition, let's think about things like  logging. So each of these runtimes probably 2:45outputting logs in a different place, and you  know, we have to figure out how we consolidate 2:49all those. Maybe we use a log streaming service.  Regardless you can see the complexity starts to 2:53grow. And finally, as you add more services and  microservices components to this architecture, 2:58say a user comes in where try to actually access  one of these services and they run into an error 3:04you need to trace that request through the  multiple services. Well unless you have the 3:08right architecture infrastructure in place,  you know something like headers on requests, 3:14maybe a way to handle web sockets, things are  going to start to get messy and you can see how 3:18the technical complexity grows quite large. So  here's where Observability comes in and actually 3:25differs, and differs itself from kind of standard  APM tools. It thinks about the more holistic cloud 3:30cloud-native approach for being able to do  things like logging and monitoring and that 3:35kind of thing. So I'll say there's three major  steps for any sort of Observability solution. 3:40We'll start with the first one we'll call  it collect, because we need to collect data. 3:46Then we'll go to monitor, and we'll talk about  this because this is you know part of monitoring. 3:51And finally we'll end with analyze, kind of doing  something with the actual data that you have so 3:58with the collect step, you know first thing let's  say that we actually made our system observable. 4:03So the great thing is with Kubernetes you get  some CPU memory data automatically. So let's say 4:07we get some of that, we get some logs from the  application all streaming to the same location 4:13and let's say we even get some other stuff like  high availability numbers or average latency, 4:17you know things that we want to  be able to track and monitor. 4:21So that brings me to my next step.  So once we have this data available 4:26we need to be able to actually do something with  it, at least visualizing it maybe if we're not 4:31actually even solving problems yet what do  we do with this data. Well maybe we create 4:36some dashboards to be able to monitor the  health of our application, and say we create 4:41multiple dashboards to be able to track different  services or kind of different business objectives, 4:47high availability versus latency, that kind of  thing. Now the final thing that I want to talk 4:52about here is what do we do next. So say we found  some bug in the application by kind of looking 4:59at our monitoring dashboards and we need to dive  in deeper and fix the problem with the node app. 5:05Well the great thing about that is an  Observability solution should allow you to do 5:10just that, it allows you to actually take it even  a step further because these days with Kubernetes 5:16you're getting a lot of that information from the  Kubernetes layer. So this is something I want to 5:20quickly pause and talk about. so with APM tools in  the past they were really kind of focused on kind 5:25of like resource constraints, CPU usage, memory  usage, that kind of thing. These days that's been 5:30offloaded to the Kubernetes layer, so you know  Observability kind of took APM and evolved it 5:36to the next stage, pulled it a step up and  enables our users to focus on things like 5:42SLOs and SLIs, Service Level Objectives  and Service Level Indicators. 5:47So these will enable you to actually focus  on things that matter to your business. 5:51So things like making sure that latencies  are low or that application uptime is 5:55high. So I think that's kind of the crucial three  steps for any sort of observability solution. 6:02Let's take a step back again. These  things can be hard to set up on your own 6:06with open source projects and capabilities  pulling all the different things together, 6:12so you might be looking at an Enterprise  Observability Solution and so when you're 6:17comparing competitors and looking at building  out your enterprise observability capability 6:22I would look at kind of three main  things. Now let's start with automation. 6:29Now every step of the way we need to make sure  that automation is there to make things easier 6:33so let's say that our dev team pushes out a new  version of the node app and go from v1 to v2. 6:40Now let's say they inadvertently introduced a  bug. Instead of making a bulk API call they now 6:46make individual API calls to the Python app. So  in our monitoring dashboard our Ops team's like 6:51oh guys something's wrong, the DB app is getting a  lot of requests what's going on? Well you need to 6:56be able to kind of automatically go back and trace  through the requests and identify what happened. 7:02That actually brings me to my second point as  well, which is context. It's always important, 7:07I can spell, to have that context. So automation  is important here because when upgrading to the 7:13new version a node you want to make sure that the  right agent is automatically installed and kind 7:17of the instrumentation is in place so your  dev team doesn't quite have to do that, and 7:22as new services get added you want your monitoring  dashboards to be automatically updated as well. 7:27And that context is extremely crucial as with  this example we needed to be able to trace that 7:32request back to the source of the problem. So once  we've traced that request back to the source with 7:37that context that we have the third step here  and I think probably one of the most important 7:42is action. What do we actually do now? And that  brings me to my last step here the analyze phase, 7:48which remember we talked about was  kind of an evolution of traditional 7:51APM tools to kind of the the way that  Observability tools implement that today. 7:56So when you get to this step you'll probably want  to look at maybe the SLIs within the node app. 8:01Maybe dive in deeper, right. So maybe you look  in and you identify that you need to look at 8:05application trace logs. So you look in the trace  logs and you identify some problems and you figure 8:10out what the what the fix is you tell it to your  dev team you know maybe the last step here is fix 8:17and then rinse and repeat for any other  issues that might come up in the future. 8:22So I think Enterprise Observability is extremely  crucial here when we're kind of looking at 8:26the bigger picture because it's not  just about having the individual pieces, 8:30which again like I said might be quite hard  to set up with purely open source approaches, 8:34but you want to think about automation to make  sure things are kind of set up seamlessly to 8:39reduce the overhead on your side. make sure you  have context to be able to see how services work 8:44with each other maybe even generate things like  dependency graphs to see the broader view because 8:50you might not always have a light board like  this to see the architecture so cleanly. And 8:54finally being able to take action when you do find  a problem. So making sure that your Observability 9:00solution has a way to automatically pull together  data from multiple sources, multiple services, 9:06and then figure out what's valid and necessary  for you to be able to make that fix happen. So 9:13IBM is invested in making sure our clients can  effectively set up Enterprise Observability 9:17with the recent acquisition of Instanta.  To learn more about the acquisition, 9:21or to get a showcase of the capabilities be sure  to check out the links in the description below. 9:26As always thanks for watching our videos. If you  liked the video or have any questions or comments, 9:31be sure to drop a like and a question or  comment below. Be sure to subscribe and 9:35stay tuned for more videos like  this in the future. Thank you.