Learning Library

← Back to Library

Shadow AI Discovery and Mitigation

Full Transcript

# Shadow AI Discovery and Mitigation **Source:** [https://www.youtube.com/watch?v=YBE6hq-OTFI](https://www.youtube.com/watch?v=YBE6hq-OTFI) **Duration:** 00:11:06 ## Sections - [00:00:00](https://www.youtube.com/watch?v=YBE6hq-OTFI&t=0s) **Discovering and Securing Shadow AI** - The speaker warns that hidden, unsanctioned AI tools—referred to as “Shadow AI”—pose data‑leak and exposure risks, and advocates proactively inventorying these tools and guiding users toward secure alternatives instead of simply prohibiting them. - [00:03:15](https://www.youtube.com/watch?v=YBE6hq-OTFI&t=195s) **Discovering Models and Data in Cloud Environments** - The speaker outlines how to connect to a cloud, differentiate platform‑based AI services from standalone models, and scan for associated assets such as tuning data and retrieval‑augmented generation (RAG) data. - [00:06:23](https://www.youtube.com/watch?v=YBE6hq-OTFI&t=383s) **Risks of Unsecured RAG Systems** - It outlines how attackers can exfiltrate sensitive data and poison retrieval‑augmented generation models when access controls are lax, using examples of data leakage, improper user permissions, and model poisoning. - [00:09:26](https://www.youtube.com/watch?v=YBE6hq-OTFI&t=566s) **From Shadow AI to Secure AI** - The speaker emphasizes using discovery tools to expose hidden AI deployments, identify misconfigurations, and apply security controls or safer alternatives, turning shadow AI into a business‑beneficial, securely managed asset. ## Full Transcript
0:00It's 2 o'clock in the morning and I have a question for you. 0:03Do you know where your AI is? 0:05Think you don't have any in your corporate environment? 0:08Do you think so, or do you know so? 0:11It will matter, because it turns out there's a thing called Shadow AI that's lurking in the corners. 0:18A lot of people have seen the power of AI 0:21generative AI in particular, 0:23and therefore they're trying to see what kind of uses they could make for that within the organization. 0:28A lot of projects, many of them unsanctioned, are going on out there. 0:33Well, if we're not careful, those things represent a threat to the corporate environment because they could be leaking data. 0:40So what I need to do then is go out and discover all instances of AI 0:46in my environment, including the ones I don't know about, 0:50especially the ones I don't know about, because those are the ones that I then need to lock down and secure. 0:57I need to make sure that they're not leaking data, that there aren't other types of exposures 1:02that they have given to the organization unintentionally. 1:06And in some cases I may, instead of just saying, No, you shouldn't do that, I'm going to say how. 1:12I'm going to provide an alternative or recommend an alternative because it's best not to say no. 1:18Don't say no, say how. 1:19If you say no, the people will go ahead and do it anyway and they just won't tell you about it. 1:24So better to to say how. 1:26And show them the right way to do it. 1:28So let's take a look at how could you do this kind of discovery 1:32and how can you do this sort of security posture management for AI, 1:37and bring the Shadow AI out of the shadows? 1:40Okay. Where should we start this discovery? 1:43Well, I'll tell you, one reasonable place is in the cloud. 1:46And why would that be? 1:48Well, because it turns out these models can be really expensive. 1:52Really compute intensive, require a lot of storage to operate. 1:56A lot of people don't have that just available. 1:58So there are some small models, but the really big ones that will make a big impact, 2:03these are likely to be hosted in a cloud environment in the first place. 2:07So let's start looking there. 2:09Now, what is an AI deployment? 2:11If we're talking about Shadow AI? 2:13What would it look like? 2:14Well, it turns out that there is, of course, a model in the middle. 2:19There is some data that we use to train that model and tune the model and so forth. 2:24And then there are apps over here that will be using the information that is in this AI model. 2:33Okay. So that's what we're looking for. 2:34That's the target. 2:35That's where if we see these kinds of things, 2:38then we need to know, is this one that we're aware of or is it one that we're not aware of? 2:43Now, what do I need to do next? 2:45I need to figure out what environments we're looking for. 2:48I said we're going to look in the cloud. 2:50So maybe we start off by listing the cloud 2:53environments that we have in our organization, and there could be a number of them. 2:58So we're going to make a list of those so that we can know where we're going to start looking, 3:03Then we're going to start looking for basically AI platforms, specific implementations of AI that we know, 3:13and here are some examples that you can see here. 3:15These will be the things that we will be looking for in this cloud environment. 3:20And then there are yet other types of environments that are essentially not platforms. 3:25In the case of something like watsonx, it's a full blown AI platform, 3:30which includes a model among a lot of other components. 3:33So that would be an example here. 3:35In some other cases, there are just basically pure models that people will get from open source places like Hugging Face, 3:43and download and put in into this environment. 3:46So we look for those a little bit differently than we would for the individual platform type environments. 3:53So now how do we do this? 3:55Well, as I said, we need to be able to do this discovery. 3:57So what I want to do is go in and first of all, connect into the cloud. 4:03And once I've done that sort of connection, the next thing I'm going to do 4:07is start looking for the data that's associated with that particular model. 4:13And I'm going to have a tool that will do that sort of stuff. 4:17What kind of data? 4:18Well, I'm going to scan for the models. 4:19Then I'm going to scan for the associated data that relates to those models. 4:24And I'm going to look at different types of data. 4:26I'm going to be looking at tuning data, the stuff that we use to tune and train the model. 4:32I'm also going to look at RAG data, retrieval augmented generation. 4:37RAG is a technology where we combine not only what the model knows, 4:41but we're giving it additional information that it can use in its inferencing. 4:46So we've looked for at least these two different types. 4:49But we like to be able to make sure that we include those into our visibility of this. 4:55Then I'm going to identify the various apps or agents or things like that. 5:00That are going to be involved in leveraging this model. 5:02So I can scan. 5:04I can look for the associated data. 5:06I can look for the apps. 5:07I want to be able to do all of that. 5:09And in the case of some of these where it's just a model and not a platform, 5:13we'll be looking really just for the signature that associates and gives us the idea that there's a particular model out there. 5:20So that's what we're trying to do. 5:22And what I need, of course, in order to make all of this work, 5:25is on a tool that is going to automate 5:28all of the things that I've just talked about here and is going to provide me a visualization. 5:35So a screen that captures and shows me exactly this and shows me what AI is out there. 5:41Now we've turned the light on. 5:43There's no more shadow. 5:45Now what do we do? 5:46Once we see into the shadow. 5:48Now what are we going to do? 5:49We need to make sure that it's secure. 5:51We need to look at the posture, the security posture of this AI deployment that we have. 5:57So, again, we've got the data, the model and the apps. 6:00Well, another component that I mentioned might be involved in this is if we're using 6:05some other data source for retrieval augmented generation, 6:10and that may be feeding information into this as well. 6:13In addition to the information that we use to tune and train the model. 6:18So if this has not been locked down, well, then we might have a problem. 6:23What could happen in that case? 6:25Well, let's say there's a bad actor out here and there are many of them out here. 6:30And let's say they come along and say, nobody really locked this thing down. 6:34I can see this thing. 6:36So I'm going to go over here and pull some information out of that. 6:39And if that happened to have anything sensitive, like maybe your customer database, 6:44you're going to run an application or an inference against the model, 6:47and then ask it questions about your customers. 6:49Then maybe you're using this as a data source that you would complement all of that with. 6:54Now, all of a sudden, that information has been exfiltrated. 6:58And that would be a big security exposure. 7:01So there's one example. 7:02How about another example? 7:04Let's say it involves access control. 7:06So we've got over here, Bob, in accounting. 7:11And Bob's a good guy, 7:12and we want him to be able to access a particular app 7:17that then is going to pull information from this system and give him back what he wants. 7:21What we don't want, Bob in accounting having access to is the actual model itself 7:28to the RAG data source or to the training data. 7:33That would be a big problem. 7:34That would be him having access, 7:37and even if he doesn't mean to do anything wrong, 7:40maybe he does something wrong and causes the system to, in fact, have issues. 7:45Another example, let's say this attacker decides to do something else in addition to exfiltrating. 7:51Maybe this guy goes along and says, You know what If I poison this system? 7:56What if I introduce just a little bit of error into the training data or into the RAG information? 8:03That little bit of error propagates through the system. 8:05And then the stuff that comes out ends up being no good. 8:08So that's another type of attack that could occur in addition to the exfiltration. 8:13It could be poisoning. 8:15And then let's take a look at one more. 8:17And the a group called OWASP, the Open 8:21World Wide Application Security Project came out with a top ten list for LLMs, 8:28and in their top ten, one of the things that they talked about was excessive agency. 8:32What does that mean? 8:33Well, let's say this app is able to do certain things, but it should be limited in terms of the kinds of things that could do. 8:42It should be able to interoperate with this model in some ways. 8:46But we want to have limitations, maybe, 8:47we want it not to be able to modify the model, 8:50for instance, as an example, if we give this app too much power. 8:55Then if the app either has a bug in it or 8:58a bad actor gets control of the app, they could exploit that and then cause problems into the system. 9:04So that excessive agency, no, we want to have everything locked down. 9:08You remember I've talked in other videos about the principle of least privilege. 9:12That's what we're trying to enforce here. 9:15Only the capabilities that are necessary in order to do the job. 9:19Now, these are just a few examples of sort of misconfigurations 9:23vulnerabilities that might have been introduced into the system. 9:26And if I have a good tool that can identify these misconfigurations and vulnerabilities, 9:32things where we've left the doors and windows open, then I can do a better job of securing the AI, the formerly Shadow AI, 9:40now discovered AI. 9:43and now a secure AI. 9:45When it comes to AI, I recommend that you don't say no, 9:49say how. 9:50Figure out how we can do this in a more secure way. 9:54And then that way we benefit the business instead of hurting. 9:57How do we do that? 9:58Well, we've got to have visibility and control. 10:02Because after all, I can't secure what I can't see. 10:05So how do I get the visibility? 10:07Well, we do discovery. 10:09We go out into the environment and find all the AI deployments, 10:13the models, the data, the applications, all of that stuff that's associated together. 10:18And then I add in the security. 10:21I look for what kinds of things 10:24need to be done to the configuration so that the security posture management is in place. 10:29This way I have an AI deployment that doesn't hurt us. 10:33That can only benefit us. 10:35And in some cases, we may say, you know what, I really want you to use an alternative AI 10:40because the one that you're using is not going to preserve confidentiality. 10:44I need something that's going to be better performing and it's going to preserve more privacy. 10:49So we want to, again, discover these things so that we can either secure them or provide alternatives, 10:56because those are the things that are going to make AI work for us. 10:59That way we shine a light on the Shadow AI and turn it into helpful AI.