Shadow AI Discovery and Mitigation
Sections
- Discovering and Securing Shadow AI - The speaker warns that hidden, unsanctioned AI tools—referred to as “Shadow AI”—pose data‑leak and exposure risks, and advocates proactively inventorying these tools and guiding users toward secure alternatives instead of simply prohibiting them.
- Discovering Models and Data in Cloud Environments - The speaker outlines how to connect to a cloud, differentiate platform‑based AI services from standalone models, and scan for associated assets such as tuning data and retrieval‑augmented generation (RAG) data.
- Risks of Unsecured RAG Systems - It outlines how attackers can exfiltrate sensitive data and poison retrieval‑augmented generation models when access controls are lax, using examples of data leakage, improper user permissions, and model poisoning.
- From Shadow AI to Secure AI - The speaker emphasizes using discovery tools to expose hidden AI deployments, identify misconfigurations, and apply security controls or safer alternatives, turning shadow AI into a business‑beneficial, securely managed asset.
Full Transcript
# Shadow AI Discovery and Mitigation **Source:** [https://www.youtube.com/watch?v=YBE6hq-OTFI](https://www.youtube.com/watch?v=YBE6hq-OTFI) **Duration:** 00:11:06 ## Sections - [00:00:00](https://www.youtube.com/watch?v=YBE6hq-OTFI&t=0s) **Discovering and Securing Shadow AI** - The speaker warns that hidden, unsanctioned AI tools—referred to as “Shadow AI”—pose data‑leak and exposure risks, and advocates proactively inventorying these tools and guiding users toward secure alternatives instead of simply prohibiting them. - [00:03:15](https://www.youtube.com/watch?v=YBE6hq-OTFI&t=195s) **Discovering Models and Data in Cloud Environments** - The speaker outlines how to connect to a cloud, differentiate platform‑based AI services from standalone models, and scan for associated assets such as tuning data and retrieval‑augmented generation (RAG) data. - [00:06:23](https://www.youtube.com/watch?v=YBE6hq-OTFI&t=383s) **Risks of Unsecured RAG Systems** - It outlines how attackers can exfiltrate sensitive data and poison retrieval‑augmented generation models when access controls are lax, using examples of data leakage, improper user permissions, and model poisoning. - [00:09:26](https://www.youtube.com/watch?v=YBE6hq-OTFI&t=566s) **From Shadow AI to Secure AI** - The speaker emphasizes using discovery tools to expose hidden AI deployments, identify misconfigurations, and apply security controls or safer alternatives, turning shadow AI into a business‑beneficial, securely managed asset. ## Full Transcript
It's 2 o'clock in the morning and I have a question for you.
Do you know where your AI is?
Think you don't have any in your corporate environment?
Do you think so, or do you know so?
It will matter, because it turns out there's a thing called Shadow AI that's lurking in the corners.
A lot of people have seen the power of AI
generative AI in particular,
and therefore they're trying to see what kind of uses they could make for that within the organization.
A lot of projects, many of them unsanctioned, are going on out there.
Well, if we're not careful, those things represent a threat to the corporate environment because they could be leaking data.
So what I need to do then is go out and discover all instances of AI
in my environment, including the ones I don't know about,
especially the ones I don't know about, because those are the ones that I then need to lock down and secure.
I need to make sure that they're not leaking data, that there aren't other types of exposures
that they have given to the organization unintentionally.
And in some cases I may, instead of just saying, No, you shouldn't do that, I'm going to say how.
I'm going to provide an alternative or recommend an alternative because it's best not to say no.
Don't say no, say how.
If you say no, the people will go ahead and do it anyway and they just won't tell you about it.
So better to to say how.
And show them the right way to do it.
So let's take a look at how could you do this kind of discovery
and how can you do this sort of security posture management for AI,
and bring the Shadow AI out of the shadows?
Okay. Where should we start this discovery?
Well, I'll tell you, one reasonable place is in the cloud.
And why would that be?
Well, because it turns out these models can be really expensive.
Really compute intensive, require a lot of storage to operate.
A lot of people don't have that just available.
So there are some small models, but the really big ones that will make a big impact,
these are likely to be hosted in a cloud environment in the first place.
So let's start looking there.
Now, what is an AI deployment?
If we're talking about Shadow AI?
What would it look like?
Well, it turns out that there is, of course, a model in the middle.
There is some data that we use to train that model and tune the model and so forth.
And then there are apps over here that will be using the information that is in this AI model.
Okay. So that's what we're looking for.
That's the target.
That's where if we see these kinds of things,
then we need to know, is this one that we're aware of or is it one that we're not aware of?
Now, what do I need to do next?
I need to figure out what environments we're looking for.
I said we're going to look in the cloud.
So maybe we start off by listing the cloud
environments that we have in our organization, and there could be a number of them.
So we're going to make a list of those so that we can know where we're going to start looking,
Then we're going to start looking for basically AI platforms, specific implementations of AI that we know,
and here are some examples that you can see here.
These will be the things that we will be looking for in this cloud environment.
And then there are yet other types of environments that are essentially not platforms.
In the case of something like watsonx, it's a full blown AI platform,
which includes a model among a lot of other components.
So that would be an example here.
In some other cases, there are just basically pure models that people will get from open source places like Hugging Face,
and download and put in into this environment.
So we look for those a little bit differently than we would for the individual platform type environments.
So now how do we do this?
Well, as I said, we need to be able to do this discovery.
So what I want to do is go in and first of all, connect into the cloud.
And once I've done that sort of connection, the next thing I'm going to do
is start looking for the data that's associated with that particular model.
And I'm going to have a tool that will do that sort of stuff.
What kind of data?
Well, I'm going to scan for the models.
Then I'm going to scan for the associated data that relates to those models.
And I'm going to look at different types of data.
I'm going to be looking at tuning data, the stuff that we use to tune and train the model.
I'm also going to look at RAG data, retrieval augmented generation.
RAG is a technology where we combine not only what the model knows,
but we're giving it additional information that it can use in its inferencing.
So we've looked for at least these two different types.
But we like to be able to make sure that we include those into our visibility of this.
Then I'm going to identify the various apps or agents or things like that.
That are going to be involved in leveraging this model.
So I can scan.
I can look for the associated data.
I can look for the apps.
I want to be able to do all of that.
And in the case of some of these where it's just a model and not a platform,
we'll be looking really just for the signature that associates and gives us the idea that there's a particular model out there.
So that's what we're trying to do.
And what I need, of course, in order to make all of this work,
is on a tool that is going to automate
all of the things that I've just talked about here and is going to provide me a visualization.
So a screen that captures and shows me exactly this and shows me what AI is out there.
Now we've turned the light on.
There's no more shadow.
Now what do we do?
Once we see into the shadow.
Now what are we going to do?
We need to make sure that it's secure.
We need to look at the posture, the security posture of this AI deployment that we have.
So, again, we've got the data, the model and the apps.
Well, another component that I mentioned might be involved in this is if we're using
some other data source for retrieval augmented generation,
and that may be feeding information into this as well.
In addition to the information that we use to tune and train the model.
So if this has not been locked down, well, then we might have a problem.
What could happen in that case?
Well, let's say there's a bad actor out here and there are many of them out here.
And let's say they come along and say, nobody really locked this thing down.
I can see this thing.
So I'm going to go over here and pull some information out of that.
And if that happened to have anything sensitive, like maybe your customer database,
you're going to run an application or an inference against the model,
and then ask it questions about your customers.
Then maybe you're using this as a data source that you would complement all of that with.
Now, all of a sudden, that information has been exfiltrated.
And that would be a big security exposure.
So there's one example.
How about another example?
Let's say it involves access control.
So we've got over here, Bob, in accounting.
And Bob's a good guy,
and we want him to be able to access a particular app
that then is going to pull information from this system and give him back what he wants.
What we don't want, Bob in accounting having access to is the actual model itself
to the RAG data source or to the training data.
That would be a big problem.
That would be him having access,
and even if he doesn't mean to do anything wrong,
maybe he does something wrong and causes the system to, in fact, have issues.
Another example, let's say this attacker decides to do something else in addition to exfiltrating.
Maybe this guy goes along and says, You know what If I poison this system?
What if I introduce just a little bit of error into the training data or into the RAG information?
That little bit of error propagates through the system.
And then the stuff that comes out ends up being no good.
So that's another type of attack that could occur in addition to the exfiltration.
It could be poisoning.
And then let's take a look at one more.
And the a group called OWASP, the Open
World Wide Application Security Project came out with a top ten list for LLMs,
and in their top ten, one of the things that they talked about was excessive agency.
What does that mean?
Well, let's say this app is able to do certain things, but it should be limited in terms of the kinds of things that could do.
It should be able to interoperate with this model in some ways.
But we want to have limitations, maybe,
we want it not to be able to modify the model,
for instance, as an example, if we give this app too much power.
Then if the app either has a bug in it or
a bad actor gets control of the app, they could exploit that and then cause problems into the system.
So that excessive agency, no, we want to have everything locked down.
You remember I've talked in other videos about the principle of least privilege.
That's what we're trying to enforce here.
Only the capabilities that are necessary in order to do the job.
Now, these are just a few examples of sort of misconfigurations
vulnerabilities that might have been introduced into the system.
And if I have a good tool that can identify these misconfigurations and vulnerabilities,
things where we've left the doors and windows open, then I can do a better job of securing the AI, the formerly Shadow AI,
now discovered AI.
and now a secure AI.
When it comes to AI, I recommend that you don't say no,
say how.
Figure out how we can do this in a more secure way.
And then that way we benefit the business instead of hurting.
How do we do that?
Well, we've got to have visibility and control.
Because after all, I can't secure what I can't see.
So how do I get the visibility?
Well, we do discovery.
We go out into the environment and find all the AI deployments,
the models, the data, the applications, all of that stuff that's associated together.
And then I add in the security.
I look for what kinds of things
need to be done to the configuration so that the security posture management is in place.
This way I have an AI deployment that doesn't hurt us.
That can only benefit us.
And in some cases, we may say, you know what, I really want you to use an alternative AI
because the one that you're using is not going to preserve confidentiality.
I need something that's going to be better performing and it's going to preserve more privacy.
So we want to, again, discover these things so that we can either secure them or provide alternatives,
because those are the things that are going to make AI work for us.
That way we shine a light on the Shadow AI and turn it into helpful AI.