OWASP Top 10 LLM Vulnerabilities
Key Points
- Chatbots have exploded in popularity, reaching 100 million users within two months, driven by generative AI and large language models.
- A standout but under‑discussed capability is bidirectional language translation, which delivers more natural and accurate results than traditional tools.
- The Open Worldwide Application Security Project (OWASP) released its first “Top 10 for Large Language Models,” highlighting new security risks unique to AI systems.
- The leading vulnerability is prompt injection—both direct (e.g., jailbreak commands that discard guardrails) and indirect (e.g., embedding malicious code that leads to remote code execution).
Full Transcript
# OWASP Top 10 LLM Vulnerabilities **Source:** [https://www.youtube.com/watch?v=cYuesqIKf9A](https://www.youtube.com/watch?v=cYuesqIKf9A) **Duration:** 00:14:19 ## Summary - Chatbots have exploded in popularity, reaching 100 million users within two months, driven by generative AI and large language models. - A standout but under‑discussed capability is bidirectional language translation, which delivers more natural and accurate results than traditional tools. - The Open Worldwide Application Security Project (OWASP) released its first “Top 10 for Large Language Models,” highlighting new security risks unique to AI systems. - The leading vulnerability is prompt injection—both direct (e.g., jailbreak commands that discard guardrails) and indirect (e.g., embedding malicious code that leads to remote code execution). ## Sections - [00:00:00](https://www.youtube.com/watch?v=cYuesqIKf9A&t=0s) **OWASP LLM Risks: Prompt Injection** - The speaker outlines the rapid rise of chatbots, praises their translation capabilities, and then focuses on OWASP’s new Top 10 list for large language models, emphasizing prompt injection as the primary vulnerability. ## Full Transcript
chat Bots have taken the World by storm
we've never seen a technology with this
kind of Rapid adoption curve in fact
it's achieved a hundred million users in
just the first two months which is
unprecedented well why it does a lot of
amazing things it uses underlying
technology of generative AI of large
language models and it does amazing
stuff one of the things I really like
that I don't hear many other people
talking about is language translation in
fact either direction and I need this a
lot
but if you have a translation that's
able to understand the language and the
words more intuitively you'll get better
translations well as with everything a
new technology comes out there's going
to be some people try to abuse it and
there's going to be risks that go along
with it so we have this organization
called the open worldwide application
security project or owasp for short and
they're very well known for their top 10
list of application security
vulnerabilities well they've recently
come out with an owasp top 10 for large
language models so let's take a look at
those in a little more detail in this
video I'm going to highlight the top
three that owasp identified and stick
around to the end and I'll reveal a
bonus topic because after all who
doesn't like a bonus
okay the number one vulnerability that
owasp highlighted is something called
prompt injection
prompt injection comes in a couple of
different forms there's a direct form
and an indirect so let's take a look at
the direct one first so in the case of a
direct prompt injection what someone is
doing let's say we have a bad actor here
and this guy is going to send his
commands into the llm the large language
model maybe it's through a chat
interface for instance and he's sending
commands in telling it specific things
to do where he's trying to take
advantage of the system he's trying to
basically break out of the sandbox that
he's been put in which is why this is
also sometimes called jailbreaking so an
example of this maybe he comes along and
he tells the system to forget
all of its previous programming forget
about your guard rails forget about the
the constraints that you've been
given and sometimes the system will do
that in fact one way to do that is also
use a prompt like pretend pretend if
you're the chat bot pretend that you are
a different chat bot one that I've just
created and if I ask you this question
how would you respond and sometimes
that's enough to confuse the chat bot
confuse the llm and you end up getting
results executed that the system did not
want to do in in the first place another
thing is an example of exploiting
vulnerabilities and here's a case where
in some cases someone might include code
or additional instructions along with
their prompt and then when the system
processes it it actually executes that
you end up what's with what's known as
remote code execution it's like I send
instructions to your computer and your
computer executes them without your
permission and this is what's happening
when this is being injected directly
into the prompt other examples privilege
escalation we might get the system to do
things that it wasn't intended to do
even provide unauthorized access so
these are examples of some of the things
that can happen through direct prompt
injection how about indirect prompt
injection well let's take a look at a
normal use case let's say we have a a
good guy user out here and he is going
to go to the llm and ask it to summarize
an article that he's seen on the web so
the llm goes back pulls that in and
summarizes the information that's there
and gives that information back it comes
back in a processed form this guy is
Happy everything worked like we expected
now what happens if we have a bad actor
who in fact is intending to mess this
thing up and what he does is he inserts
something into the web page maybe it's
even unprintable characters maybe it's
things that are not seen for instance if
he writes text in White on a white
background the human user wouldn't see
that but a system that's scraping the
web would see that and would potentially
process that so as the llm looks at that
information maybe we include the kinds
of things that we were doing up here to
sort of jump out of the sandbox and
jailbreak we're including that in here
and now the results after this web page
has been compromised and sort of has
this hidden message in it that comes
back to the llm and then what we end up
with is something that in some cases
might even have code in it that is going
to run on this user system without their
permission and now this guy has been
hacked so that's an example of indirect
prompt injection
now what can you do to stop this I don't
want to just talk about the issues and O
wasp goes on and talks about preventions
for instance what we want is privilege
control you you've heard me talk before
in other videos about the principle of
least privilege it's Bedrock Concept in
security and we need to implement that
in these cases as well so that our
back-end system
also has some sort of of limitation so
in other words we only give to whatever
the the systems are out here only give
them the minimum uh privileges that they
need in order to do their job so that
way if a command is included and it
tries to jump out it's automatically
restricted other things we want to do is
include a human in the loop so if in
this case the output that's coming from
this is then going to be executed on
another system then we might want to
make sure that before it hits that
external system that we've done
something to allow a human to say yes do
this thing or no don't do this thing so
don't always just let the system run
completely in an automated unfettered
way keep a human in the loop another
thing that we can do to prevent these
kinds of problems is segregate content
from prompts
we don't often do that very well in fact
it's easy to include additional commands
in the prompt itself and that's where
some of these problems arise so we need
a system that clearly separates and
creates good trust boundaries as they
say good defenses make good neighbors we
need good fences between the content and
the prompts
okay number two on the owasp top 10 for
llm is insecure output handling
what does that mean well let's take an
example where we have an application
that is leveraging an llm this
application is going to go to the llm
and let's say we've got a database the
application says llm do a search of this
database for every occurrence of the
string IBM well what the llm can do by
the way they can generate code so let's
say it generates a SQL query for us
really quickly so it pops up as you see
here and then we send that automatically
to the database get the results it all
comes back and everybody's happy that's
how it should work but what happens if
for some reason the llm has been
compromised in some way either
intentionally or there's even just an
error that could cause this to happen
but either way it's going to have the
same effect if this thing has been
compromised though maybe instead of
issuing that SQL query maybe it issues
this SQL command which what does that do
well that deletes the entire database
that's a disaster that's not what we
were planning on so what is the issue
here in this case well we didn't check
the output coming from the llm we just
just took it for granted and let it run
so clearly what we should do is uh in
our first level is do not assume that an
llm is a trusted user an llm is not a
trusted user and remember I talked
before about putting a human in the loop
or putting other kind of safeguards in
place that's what we need to do in fact
what we should have is some sort of
guard in here that does checking and
it's going to look and see if this
command that's coming through that was
about to do a drop database we want to
be able to block that so that it doesn't
go through at all so we want to put
those kind of checks in place also
validate IO again this kind of system
would be able to do that level of
validation so we don't just trust we
verify
okay number three on the owasp top 10
for llms is dealing with the subject of
training data
we need to make sure that the training
data we have is trustworthy and accurate
or we end up with a situation where we
end up with bad results so here's an
example where let's say we have an llm
here and it's going to go out to let's
say the web it's going to go pull in
some other documents from a database a
lot of different sources it's going to
take all of this information in and then
a user comes along and says I want you
to summarize that information maybe this
person's about to make an investment
decision and they want to know by doing
some research on a particular company
what is that company up to what are
their products like what do customers
think of their products that sort of
thing so this is something that could
work very well it goes up pulls all this
information synthesizes it and then the
user gets some information back so
they've asked a good question now
they've gotten an answer
until
there's always somebody that's going to
come in and gum up the works this guy
comes in and plants a false report in
this database that says there's a
product safety issue from this
particular company it's not true but
it's in the database now so when that
information is imported into the llm and
processed it's going to feed that
incorrect information to this guy and
now he is asking a question where he's
probably going to make a wrong decision
because after all the old saying goes
garbage in
garbage out that's what we're going to
end up in this case so the llm is doing
its job but it's only as good as the
information it has and sometimes these
things are so good that we just trust
them implicitly and that's something we
need to be careful about so what's the
prevention in this case well know your
sources know that these things are in
fact first of all know which sources
it's pulling from and that they're
trustworthy then know that these those
sources have not been compromised so
that's the first thing verify that
information so we want to know we want
to verify we want to validate the
results in other words now I pull all
this in does it all make sense does the
things that I've added up here one plus
one does it equal 2 or does it equal 73
because that's a case where we're going
to not believe the results we get back
and then ultimately we're going to keep
doing this over and over and over again
wash rinse and repeat it's always about
constant vigilance and checking the
model curating the data that we have
being selective about the sources we
have and making sure that there has not
been this sort of compromise or Corpus
poisoning of the database
okay you've made it through the top
three now for the bonus time the bonus
item of the top ten is about over
Reliance over Reliance on what the
technology can do can cause us to have
problems the last one that I talked
about number three was about the
compromise to the training data this is
kind of more of an attack on the user on
the on the way that they are using the
system as much as it is the technology
because we have to understand uh what it
can do so for instance if I started
writing out this the moon is made of and
then it completes as a generative AI
might do green cheese okay not what I
had in mind not exactly right what is
that when the system says something like
that that's what we call a hallucination
and llms are prone to this there is no
way to eliminate all possible
hallucinations at this point that we
know of so what can we do to deal with
that reality because after all the llm
is still very valuable in many many use
cases so how can we use it more
effectively and not become prone to the
misinformation that could happen from a
hallucination well the the prevention
here is understand first of all what are
the limits of llms what can they do and
what can they not do and from there we
need to train users on what those limits
are have them with the right level of
expectation that not everything that
comes out has to be believed in fact I'm
going to let you in on this secret not
everything on the internet is true and
not everything that comes out of an llm
is going to be completely implicitly
trustworthy that's why we have to verify
sources and ultimately we want a level
of explainability
in the system we wanted to be able to
show its work and explain to us how did
you come from this set of propositions
this set of data to this conclusion and
once we know that then we can put more
trust in the system
so now you've had a sense of what the
OAS organization which has a long track
record of really solid advice what
things we need to be looking for with
llms in order to be able to use them
more safely and more securely
thanks for watching if you found this
video interesting and would like to
learn more about cyber security please
remember to hit like And subscribe to
this channel