Learning Library

← Back to Library

Seven Pillars of Storage Observability

Key Points

  • A world‑class observability tool is essential for storage arrays, just as a dashboard is critical for safely operating a car.
  • The tool must address seven “pillars” of observability: availability, performance, capacity, security, inventory, cost, and sustainability.
  • Each pillar answers key operational questions such as whether the storage is up, meeting latency/IOPS goals, has enough space, is protected from ransomware, what assets exist, how much it costs, and its power‑and‑carbon impact.
  • By providing answers to these pillars, the observability solution enables proactive planning, risk mitigation, and alignment with financial and environmental targets.
  • Core use cases—starting with end‑to‑end visibility—demonstrate how the tool helps administrators manage and optimize storage infrastructure across the entire lifecycle.

Full Transcript

# Seven Pillars of Storage Observability **Source:** [https://www.youtube.com/watch?v=QGrLG_zqxJU](https://www.youtube.com/watch?v=QGrLG_zqxJU) **Duration:** 00:18:03 ## Summary - A world‑class observability tool is essential for storage arrays, just as a dashboard is critical for safely operating a car. - The tool must address seven “pillars” of observability: availability, performance, capacity, security, inventory, cost, and sustainability. - Each pillar answers key operational questions such as whether the storage is up, meeting latency/IOPS goals, has enough space, is protected from ransomware, what assets exist, how much it costs, and its power‑and‑carbon impact. - By providing answers to these pillars, the observability solution enables proactive planning, risk mitigation, and alignment with financial and environmental targets. - Core use cases—starting with end‑to‑end visibility—demonstrate how the tool helps administrators manage and optimize storage infrastructure across the entire lifecycle. ## Sections - [00:00:00](https://www.youtube.com/watch?v=QGrLG_zqxJU&t=0s) **Untitled Section** - - [00:03:18](https://www.youtube.com/watch?v=QGrLG_zqxJU&t=198s) **Sustainable Storage Observability Use Cases** - The speaker outlines how a storage observability tool supports sustainability goals and operational efficiency through visibility, proactive issue detection, performance optimization, capacity planning, and cost management. - [00:07:12](https://www.youtube.com/watch?v=QGrLG_zqxJU&t=432s) **AI-Driven Storage Observability** - The speaker explains how AI helps storage admins by automatically detecting anomalies, forecasting capacity, performing root‑cause analysis, and automating actions to reduce noise and manual effort. - [00:10:30](https://www.youtube.com/watch?v=QGrLG_zqxJU&t=630s) **AI-Driven Storage Observability Stages** - The speaker explains how AI advances storage observability—from basic monitoring to self‑learning workload tiering and finally agentic AI Ops that automate actions for optimized performance and cost. - [00:13:39](https://www.youtube.com/watch?v=QGrLG_zqxJU&t=819s) **AI‑Driven Self‑Healing Storage Ops** - The speaker explains how a chatbot‑style agent can deliver instant, goal‑driven responses and automatically remediate storage problems, creating self‑healing infrastructure that eases administrators’ workload. ## Full Transcript
0:00Imagine you are buying a car. 0:06And your car does not have any dashboard. So, what will happen? 0:14Similarly, when you buy a storage array, you need to have an, uh, a very world-class 0:20observability tool because buying a storage without observability is like driving a car 0:27without a dashboard. You may keep moving, but you do not know when you are going to run out 0:34of fuel, when you-your car is going to overheat, or you are headed for a 0:41breakdown. Now, having said that, now that uh, I have established that we 0:48need a world-class observability tool when you buy a storage box. What are the critical 0:55operational questions that this tool can answer for you? And that I call it the seven 1:02pillars of observability. So what are those pillars? Number one: 1:08availability. Is my uh, 1:15storage infrastructure available? Does it have enough availability for my applications? This is 1:20the very basic question a storage admin needs to know. Hence that is the first pillar. 1:27Second: performance. Is my 1:33storage uh, infrastructure performing well? Does it give enough resources to my applications? How is 1:40it doing with respect to latency and IOPS? These are the critical questions that a world-class 1:46observability tool can answer once the storage array is configured. Now I moved into a very 1:52critical one. The third one: capacity. Do 1:59I have enough capacity now? What is that I was capacity I was using in the past? When is 2:06my uh, my box running going to run out of capacity? How can I order more capacity when it's going to run 2:11out? Next, and the very important pillar in the 2:18recent days has become security. Is my storage infrastructure secure? Do I h, do I have 2:25enough safety against the ransomware attacks? Is my security posture configured correctly? How do I 2:31know this tool can answer those questions and make me safe? The next one: 2:38inventory. What does my inventory look like? How many block storage devices I have? How many file 2:45storage devices I have? How many of them are hyperconverged? All these critical questions so that I 2:51can plan my storage infrastructure better can be answered when I know what my inventory has, 2:57right? Cost. What is the current cost I am paying for my 3:04storage infrastructure? How much is my cost going to escalate? Is my cost going to go out of uh, 3:11a limit? Only my world-class observability tool can help, and hence this is a very critical part. 3:20And last but not the least, sustainability. 3:29Is my storage infrastructure doing well with respect to power consumption? What is 3:35it taking for carbon emissions? Is my infrastructure aligning with my carbon goals? 3:42What does the data say? Now, this is where my storage uh, observability tool will 3:49help align my box. Now, having answered all the critical operational 3:56questions via seven pillars of observability, let's look at use cases where the observability tool is a 4:02must-have for storage boxes. The use cases that can help the admin to manage your storage 4:09infrastructure better. What are those use cases? End-to-end visibility. In a 4:16storage infrastructure, there are many components, from the host to switches to the storage box. And you 4:22want to know all the paths from your applications and till the storage, what's exactly the 4:27thing. So it helps teams understand how storage systems, applications and infrastructure are 4:32interconnected to eliminate b-blind spots. Next, proactive issue 4:38detection. By monitoring health performance and anomalies in real time, this enables early 4:45detection and faster resolution of if-issues before the impact. Right? 4:53Performance optimization. For a big time a storage admin has a problem: How is my 4:59latency? How is my IOPs? How is my throughput? Now, if you have world-class observability tool, it can 5:06provide insights into the workloads, bottlenecks and utilization patterns, ensuring optimal 5:12performance and resource allocation. Next, capacity planning and cost 5:18management. How am I doing with respect to capacity? With a world-class storage observability 5:24tool, the usage trends and predictive analytics, a storage admin can plan the storage 5:31growth, avoid overprovisioning, which is critical, and reduce unnecessary costs. 5:39Right? In a data center today, incommon, data centers, they're generally multi-vendor 5:46and hybrid environments. A world-class storage observability tool simplifies management across 5:51different storage vendors and if whether it is on prem or cloud, 5:59ensuring consistency and reducing complexity. It improves 6:05reliability by continuously monitoring, reducing the downtime risks, 6:12and enhances resilience against failures because of multiple upgrades or multiple 6:19security threads. With all this, what you have in your hand is data-driven 6:25decision-making. A world-class ob ... storage observability tool translates raw storage metric 6:31into actionable insights 6:38so that it can empower IT leaders to align their storage strategy with 6:45their upcoming goals. I will go ahead and talk about the role of 6:52AI in data storage observability, right? AI plays a huge part when 6:59it comes to ease of storage observability. Observability is about managing 7:06the data the storage array produces. And the entire infrastructure produces exabytes of data. 7:12It's manually impossible to observe the data with m ... with a manual 7:19way. Hence, the storage admins leverage AI in order to make storage observability and 7:25management better.How, how does it help? Let's, let's go to the use cases. The first one: anomaly 7:32detection. 7:38AI algorithms in observability tools can learn normal storage behavior and automatically 7:45detect unusual patterns. Example, sudden latency spikes, abnormal I/O or capacity 7:52anomalies before they become critical issues. Second: 7:58predictive analytics,right? 8:05AI can forecast storage growth, performance trends 8:12and proactively do capacity planning, enabling 8:18proactive management and risk mitigation. Third 8:24is root cause analysis. A pain 8:31point for all storage admins. Instead of manually sitting through logs and 8:38metrics, storage admins can leverage artificial intelligence, which can correlate 8:45signals and across the infrastructure layers to 8:52quickly pinpoint to the root cause behind the performance issues 8:58or availability issues. And thus, it can help make storage 9:05admins' life a lot simpler. Intelligent automation. 9:17We all want our storage boxes to be plug and play, and AI can help us achieve that. AI can recommend 9:24or even trigger actions based on the 9:30data, observability data, and do load balancing, 9:37or cache optimization without any human intervention, thus 9:43reducing downtime and any manual operational effort. 9:51Now, another big pain point for storage admins is a lot of noise. 9:58All the storage boxes produce a lot of alerts because of all the components inside it. And in 10:05large environments, storage admins are flooded with alerts. Here, AI comes to your rescue. 10:12AI can filter the false positives and 10:18prioritize what is critical and surface 10:25out only what truly matters, thus achieving noise reduction. 10:32Workload optimization. Are my 10:39workloads running in the most efficient manner in a storage box? Well, it's very 10:46difficult to figure out in a manual way, but observability tools with AI can analyze 10:53user usage patterns and can suggest optimal data placement across 11:00the entire storage box to-to ensure that the tiering is in 11:07there to balance cost and performance. With all this in place, 11:15AI is also learning from the data that the storage boxes are producing, 11:21enabling in what I call as self-learning insights. So over the time, AI 11:28adapts to the unique capacity characteristics of your storage box 11:35and it then uses those characteristics of your workloads and storage systems 11:42in order to improve the observability outcomes. With, now that we have gone through the role of 11:49AI in storage observability, let's see about how agentic AI Ops 11:57can help in storage observability much better. Now, before going there, I would 12:04like to talk about four stages of storage observability, right? Now, first is 12:11monitoring, the basic monitoring of a storage box. Second 12:17is when I start observing the data that the storage box 12:24produces and giving insights to the admin. 12:31Third one is AI Ops, that I leveraged AI in order to make the storage 12:38observability experience a lot better for the admin. And then, finally, 12:45agentic AI Ops, where I have asked few agents 12:52to do the work for me, which with normal AI I was unable to do. Now 12:59what are those use cases where agentic AI Ops can take the 13:06observability experience a notch better, right? Now, the first one: autonomous 13:13monitoring and response, right? Now, what does this 13:19mean? Based on the data that the observability tool is getting from the storage, it 13:26can start responding real time to the storage admin. I am getting a message 13:33that my drive is not available. So what do I do? What is my action? 13:40So it can, in a chatbot kind of manner, provide immediate response and thus help storage 13:47admins' life a lot easier. Second, a goal-driven response, 13:53or goal-driven 14:00operations, I must say. An admin can define the goals that 14:07are needed from the observability, and when the goals are reached, the ... the relevant 14:14agent starts responding with the admin, hence achieving the goal-driven operations. For 14:20example, if he needs a performance latency less than 14:2710seconds. So every time it exceeds 10 seconds, it goes and autocorrects 14:34and thus, helping the admin, not having to sit through and monitor through thousand alerts and 14:40doing those actions, rather trusting the agent to go see the goal and take the necessary 14:47actions. Right? Now, next and the most critical and which is currently 14:55not appreciated by all the users, is self-healing infrastructure. Now, 15:02a storage box is a complicated piece of hardware with lots of components in it, 15:09and lots of software that interact with each other to give the best data storage experience. So 15:15it is going to come up with issues regularly. When the issues come up, admins run around in a team to 15:22figure out how do I resolve the issues? But, if 15:29I can, if I am able to outsource it to agents and then they can 15:35self-correct themselves, I, we're basically achieving self-healing infrastructure. So whenever, 15:41let's say, for example, there is a storage drive. And that failed, right? What can I do? A 15:48controller went for a toss. What can I do? A back plane had some issue. How can I 15:55heal it? So, all the actions and then automated manner,if it, it goes 16:02for a toss, how can I self-heal them so that I can provide a best storage 16:08experience for my admin. And last, and the biggest pain point, which is still not 16:15solvable by AI completely, where agentic AI can really be a lifesaver is uh, it's in lifecycle 16:21management. Now, a storage admin who has 50 to 100 storage 16:28arrays has to regularly upgrade patches and the new releases, the new security 16:35fixes. And every storage vendor has a different lifecycle, different release 16:42timelines. So, today, admins maintain everything in an Excel sheet, 16:48manually, interact with their team: what-what time it is coming up, then they have a 16:55downtime, they have to communicate. And it is a management nightmare. Rather, you can outsource 17:02all of it to agents. The observability tool can release the agents based on 17:09the certain storage boxes, and they can achieve a complete lifecycle management 17:16experience, where the agents will automatically track the amount of arrays, their release 17:22timelines, and depending on each of the release timelines, they will take the necessary upgrade 17:29actions and send an email once that is 17:36done. If the upgrade has not taken place, they can send a communication 17:42on wh-what the next upgrade should be and how it can be achieved, providing 17:49a complete automated way of asset lifecycle management, eliminating hours and hours of manual uh, 17:56effort that the storage admin puts today. And here agentic AIOps can be really a lifesaver.