Learning Library

← Back to Library

OWASP Top 10 LLM Vulnerabilities

Key Points

  • Chatbots have exploded in popularity, reaching 100 million users within two months, driven by generative AI and large language models.
  • A standout but under‑discussed capability is bidirectional language translation, which delivers more natural and accurate results than traditional tools.
  • The Open Worldwide Application Security Project (OWASP) released its first “Top 10 for Large Language Models,” highlighting new security risks unique to AI systems.
  • The leading vulnerability is prompt injection—both direct (e.g., jailbreak commands that discard guardrails) and indirect (e.g., embedding malicious code that leads to remote code execution).

Full Transcript

# OWASP Top 10 LLM Vulnerabilities **Source:** [https://www.youtube.com/watch?v=cYuesqIKf9A](https://www.youtube.com/watch?v=cYuesqIKf9A) **Duration:** 00:14:19 ## Summary - Chatbots have exploded in popularity, reaching 100 million users within two months, driven by generative AI and large language models. - A standout but under‑discussed capability is bidirectional language translation, which delivers more natural and accurate results than traditional tools. - The Open Worldwide Application Security Project (OWASP) released its first “Top 10 for Large Language Models,” highlighting new security risks unique to AI systems. - The leading vulnerability is prompt injection—both direct (e.g., jailbreak commands that discard guardrails) and indirect (e.g., embedding malicious code that leads to remote code execution). ## Sections - [00:00:00](https://www.youtube.com/watch?v=cYuesqIKf9A&t=0s) **OWASP LLM Risks: Prompt Injection** - The speaker outlines the rapid rise of chatbots, praises their translation capabilities, and then focuses on OWASP’s new Top 10 list for large language models, emphasizing prompt injection as the primary vulnerability. ## Full Transcript
0:00chat Bots have taken the World by storm 0:02we've never seen a technology with this 0:05kind of Rapid adoption curve in fact 0:07it's achieved a hundred million users in 0:12just the first two months which is 0:14unprecedented well why it does a lot of 0:17amazing things it uses underlying 0:19technology of generative AI of large 0:22language models and it does amazing 0:25stuff one of the things I really like 0:26that I don't hear many other people 0:28talking about is language translation in 0:31fact either direction and I need this a 0:34lot 0:37but if you have a translation that's 0:40able to understand the language and the 0:42words more intuitively you'll get better 0:44translations well as with everything a 0:48new technology comes out there's going 0:49to be some people try to abuse it and 0:51there's going to be risks that go along 0:52with it so we have this organization 0:54called the open worldwide application 0:58security project or owasp for short and 1:01they're very well known for their top 10 1:03list of application security 1:04vulnerabilities well they've recently 1:06come out with an owasp top 10 for large 1:10language models so let's take a look at 1:12those in a little more detail in this 1:14video I'm going to highlight the top 1:16three that owasp identified and stick 1:19around to the end and I'll reveal a 1:21bonus topic because after all who 1:23doesn't like a bonus 1:25okay the number one vulnerability that 1:28owasp highlighted is something called 1:31prompt injection 1:34prompt injection comes in a couple of 1:36different forms there's a direct form 1:39and an indirect so let's take a look at 1:41the direct one first so in the case of a 1:44direct prompt injection what someone is 1:46doing let's say we have a bad actor here 1:48and this guy is going to send his 1:51commands into the llm the large language 1:53model maybe it's through a chat 1:54interface for instance and he's sending 1:57commands in telling it specific things 1:59to do where he's trying to take 2:01advantage of the system he's trying to 2:03basically break out of the sandbox that 2:05he's been put in which is why this is 2:07also sometimes called jailbreaking so an 2:09example of this maybe he comes along and 2:12he tells the system to forget 2:16all of its previous programming forget 2:18about your guard rails forget about the 2:21the constraints that you've been 2:23given and sometimes the system will do 2:26that in fact one way to do that is also 2:28use a prompt like pretend pretend if 2:31you're the chat bot pretend that you are 2:34a different chat bot one that I've just 2:36created and if I ask you this question 2:38how would you respond and sometimes 2:40that's enough to confuse the chat bot 2:43confuse the llm and you end up getting 2:45results executed that the system did not 2:48want to do in in the first place another 2:51thing is an example of exploiting 2:55vulnerabilities and here's a case where 2:58in some cases someone might include code 3:01or additional instructions along with 3:03their prompt and then when the system 3:05processes it it actually executes that 3:07you end up what's with what's known as 3:09remote code execution it's like I send 3:12instructions to your computer and your 3:15computer executes them without your 3:17permission and this is what's happening 3:19when this is being injected directly 3:22into the prompt other examples privilege 3:24escalation we might get the system to do 3:27things that it wasn't intended to do 3:30even provide unauthorized access so 3:33these are examples of some of the things 3:34that can happen through direct prompt 3:36injection how about indirect prompt 3:38injection well let's take a look at a 3:41normal use case let's say we have a a 3:43good guy user out here and he is going 3:46to go to the llm and ask it to summarize 3:49an article that he's seen on the web so 3:52the llm goes back pulls that in and 3:55summarizes the information that's there 3:57and gives that information back it comes 4:00back in a processed form this guy is 4:03Happy everything worked like we expected 4:06now what happens if we have a bad actor 4:09who in fact is intending to mess this 4:12thing up and what he does is he inserts 4:15something into the web page maybe it's 4:17even unprintable characters maybe it's 4:19things that are not seen for instance if 4:21he writes text in White on a white 4:23background the human user wouldn't see 4:25that but a system that's scraping the 4:28web would see that and would potentially 4:30process that so as the llm looks at that 4:33information maybe we include the kinds 4:35of things that we were doing up here to 4:37sort of jump out of the sandbox and 4:38jailbreak we're including that in here 4:41and now the results after this web page 4:44has been compromised and sort of has 4:46this hidden message in it that comes 4:49back to the llm and then what we end up 4:51with is something that in some cases 4:53might even have code in it that is going 4:56to run on this user system without their 4:59permission and now this guy has been 5:01hacked so that's an example of indirect 5:04prompt injection 5:05now what can you do to stop this I don't 5:07want to just talk about the issues and O 5:10wasp goes on and talks about preventions 5:12for instance what we want is privilege 5:14control you you've heard me talk before 5:16in other videos about the principle of 5:18least privilege it's Bedrock Concept in 5:21security and we need to implement that 5:24in these cases as well so that our 5:27back-end system 5:30also has some sort of of limitation so 5:34in other words we only give to whatever 5:36the the systems are out here only give 5:38them the minimum uh privileges that they 5:41need in order to do their job so that 5:43way if a command is included and it 5:45tries to jump out it's automatically 5:47restricted other things we want to do is 5:50include a human in the loop so if in 5:53this case the output that's coming from 5:55this is then going to be executed on 5:57another system then we might want to 5:59make sure that before it hits that 6:02external system that we've done 6:04something to allow a human to say yes do 6:07this thing or no don't do this thing so 6:09don't always just let the system run 6:11completely in an automated unfettered 6:13way keep a human in the loop another 6:16thing that we can do to prevent these 6:17kinds of problems is segregate content 6:20from prompts 6:21we don't often do that very well in fact 6:24it's easy to include additional commands 6:27in the prompt itself and that's where 6:28some of these problems arise so we need 6:30a system that clearly separates and 6:34creates good trust boundaries as they 6:37say good defenses make good neighbors we 6:39need good fences between the content and 6:42the prompts 6:44okay number two on the owasp top 10 for 6:47llm is insecure output handling 6:53what does that mean well let's take an 6:55example where we have an application 6:57that is leveraging an llm this 7:00application is going to go to the llm 7:03and let's say we've got a database the 7:05application says llm do a search of this 7:08database for every occurrence of the 7:11string IBM well what the llm can do by 7:15the way they can generate code so let's 7:16say it generates a SQL query for us 7:18really quickly so it pops up as you see 7:21here and then we send that automatically 7:24to the database get the results it all 7:27comes back and everybody's happy that's 7:29how it should work but what happens if 7:33for some reason the llm has been 7:35compromised in some way either 7:37intentionally or there's even just an 7:40error that could cause this to happen 7:41but either way it's going to have the 7:43same effect if this thing has been 7:45compromised though maybe instead of 7:47issuing that SQL query maybe it issues 7:50this SQL command which what does that do 7:53well that deletes the entire database 7:56that's a disaster that's not what we 7:59were planning on so what is the issue 8:02here in this case well we didn't check 8:04the output coming from the llm we just 8:06just took it for granted and let it run 8:09so clearly what we should do is uh in 8:13our first level is do not assume that an 8:17llm is a trusted user an llm is not a 8:20trusted user and remember I talked 8:22before about putting a human in the loop 8:24or putting other kind of safeguards in 8:26place that's what we need to do in fact 8:28what we should have is some sort of 8:30guard in here that does checking and 8:33it's going to look and see if this 8:35command that's coming through that was 8:37about to do a drop database we want to 8:40be able to block that so that it doesn't 8:42go through at all so we want to put 8:44those kind of checks in place also 8:46validate IO again this kind of system 8:49would be able to do that level of 8:51validation so we don't just trust we 8:54verify 8:55okay number three on the owasp top 10 8:59for llms is dealing with the subject of 9:03training data 9:06we need to make sure that the training 9:07data we have is trustworthy and accurate 9:10or we end up with a situation where we 9:12end up with bad results so here's an 9:15example where let's say we have an llm 9:18here and it's going to go out to let's 9:21say the web it's going to go pull in 9:24some other documents from a database a 9:26lot of different sources it's going to 9:27take all of this information in and then 9:29a user comes along and says I want you 9:32to summarize that information maybe this 9:34person's about to make an investment 9:36decision and they want to know by doing 9:39some research on a particular company 9:40what is that company up to what are 9:42their products like what do customers 9:44think of their products that sort of 9:45thing so this is something that could 9:47work very well it goes up pulls all this 9:50information synthesizes it and then the 9:52user gets some information back so 9:54they've asked a good question now 9:56they've gotten an answer 9:57until 9:59there's always somebody that's going to 10:00come in and gum up the works this guy 10:03comes in and plants a false report in 10:06this database that says there's a 10:09product safety issue from this 10:10particular company it's not true but 10:13it's in the database now so when that 10:15information is imported into the llm and 10:19processed it's going to feed that 10:21incorrect information to this guy and 10:24now he is asking a question where he's 10:27probably going to make a wrong decision 10:29because after all the old saying goes 10:32garbage in 10:34garbage out that's what we're going to 10:36end up in this case so the llm is doing 10:39its job but it's only as good as the 10:42information it has and sometimes these 10:44things are so good that we just trust 10:46them implicitly and that's something we 10:48need to be careful about so what's the 10:50prevention in this case well know your 10:52sources know that these things are in 10:55fact first of all know which sources 10:56it's pulling from and that they're 10:58trustworthy then know that these those 11:00sources have not been compromised so 11:02that's the first thing verify that 11:05information so we want to know we want 11:07to verify we want to validate the 11:09results in other words now I pull all 11:11this in does it all make sense does the 11:13things that I've added up here one plus 11:16one does it equal 2 or does it equal 73 11:19because that's a case where we're going 11:21to not believe the results we get back 11:23and then ultimately we're going to keep 11:26doing this over and over and over again 11:28wash rinse and repeat it's always about 11:31constant vigilance and checking the 11:34model curating the data that we have 11:36being selective about the sources we 11:39have and making sure that there has not 11:41been this sort of compromise or Corpus 11:44poisoning of the database 11:47okay you've made it through the top 11:49three now for the bonus time the bonus 11:52item of the top ten is about over 11:57Reliance over Reliance on what the 12:00technology can do can cause us to have 12:02problems the last one that I talked 12:05about number three was about the 12:06compromise to the training data this is 12:09kind of more of an attack on the user on 12:12the on the way that they are using the 12:14system as much as it is the technology 12:15because we have to understand uh what it 12:18can do so for instance if I started 12:20writing out this the moon is made of and 12:24then it completes as a generative AI 12:26might do green cheese okay not what I 12:31had in mind not exactly right what is 12:34that when the system says something like 12:36that that's what we call a hallucination 12:39and llms are prone to this there is no 12:43way to eliminate all possible 12:45hallucinations at this point that we 12:48know of so what can we do to deal with 12:51that reality because after all the llm 12:54is still very valuable in many many use 12:56cases so how can we use it more 12:59effectively and not become prone to the 13:02misinformation that could happen from a 13:04hallucination well the the prevention 13:06here is understand first of all what are 13:09the limits of llms what can they do and 13:11what can they not do and from there we 13:14need to train users on what those limits 13:16are have them with the right level of 13:18expectation that not everything that 13:21comes out has to be believed in fact I'm 13:23going to let you in on this secret not 13:25everything on the internet is true and 13:27not everything that comes out of an llm 13:29is going to be completely implicitly 13:32trustworthy that's why we have to verify 13:34sources and ultimately we want a level 13:38of explainability 13:40in the system we wanted to be able to 13:42show its work and explain to us how did 13:45you come from this set of propositions 13:48this set of data to this conclusion and 13:51once we know that then we can put more 13:53trust in the system 13:54so now you've had a sense of what the 13:58OAS organization which has a long track 14:00record of really solid advice what 14:03things we need to be looking for with 14:04llms in order to be able to use them 14:07more safely and more securely 14:10thanks for watching if you found this 14:12video interesting and would like to 14:13learn more about cyber security please 14:15remember to hit like And subscribe to 14:17this channel