Learning Library

← Back to Library

Six Major Adversarial AI Attack Types

Key Points

  • The field of adversarial AI is exploding, with over 6,000 research papers published on the topic, highlighting a rapid increase in both interest and threat development.
  • Prompt‑injection attacks—either direct commands or indirect instructions embedded in external content—function like social engineering, “jailbreaking” language models into obeying malicious requests they were not designed to fulfill.
  • Infection attacks can embed malware, trojans, or back‑doors into AI models themselves, especially when organizations download pretrained models from third‑party supply chains, turning the model into a compromised asset.
  • These two attack vectors are considered among the most prevalent threats to large language models, as documented in recent industry reports such as the OAS study.
  • The video concludes by offering three practical resources to help practitioners better understand adversarial AI and build effective defensive measures.

Full Transcript

# Six Major Adversarial AI Attack Types **Source:** [https://www.youtube.com/watch?v=_9x-mAHGgC4](https://www.youtube.com/watch?v=_9x-mAHGgC4) **Duration:** 00:09:28 ## Summary - The field of adversarial AI is exploding, with over 6,000 research papers published on the topic, highlighting a rapid increase in both interest and threat development. - Prompt‑injection attacks—either direct commands or indirect instructions embedded in external content—function like social engineering, “jailbreaking” language models into obeying malicious requests they were not designed to fulfill. - Infection attacks can embed malware, trojans, or back‑doors into AI models themselves, especially when organizations download pretrained models from third‑party supply chains, turning the model into a compromised asset. - These two attack vectors are considered among the most prevalent threats to large language models, as documented in recent industry reports such as the OAS study. - The video concludes by offering three practical resources to help practitioners better understand adversarial AI and build effective defensive measures. ## Sections - [00:00:00](https://www.youtube.com/watch?v=_9x-mAHGgC4&t=0s) **Understanding Prompt Injection Attacks** - The segment outlines the surge of adversarial AI research, explains how prompt injection (or AI jailbreaking) works as a social‑engineering attack, previews six major attack categories, and promises resources for learning defenses. ## Full Transcript
0:00anytime something new comes along 0:02there's always going to be somebody that 0:03tries to break it AI is no different and 0:06this is why it seems we can't have nice 0:08things in fact we've already seen more 0:10than 6,000 research papers exponential 0:13growth that have been published related 0:16to adversarial AI examples now in this 0:19video we're going to take a look at six 0:21different types of attacks major classes 0:24and try to understand them better and 0:26then stick around to the end where I'm 0:27going to share with you three different 0:29resources that you can use to understand 0:31the problem better and build defenses so 0:34you might have heard of a SQL injection 0:36attack when we're talking about an AI 0:39well we have prompt injection attacks 0:41what does a prompt injection attack 0:43involve well think of it is sort of like 0:47a social engineering of the AI so we're 0:50convincing it to do things it shouldn't 0:52do sometimes it's referred to is 0:54jailbreaking but we're basically doing 0:56this in one of two ways there's a direct 0:59injection attack where we have an 1:00individual that sends a command into the 1:03AI and tells it to do something pretend 1:07that this is the case uh or I want you 1:09to play a game that looks like this I 1:11want you to give me all wrong answers 1:13these might be some of the things that 1:15we inject into the system and because 1:17it's wanting to please it's going to try 1:19to do everything that you ask it to 1:21unless it's been explicitly told not to 1:23do that it will follow the rules that 1:25you've told it so you're setting a new 1:27context and now it starts operating out 1:30of the context that we originally 1:31intended it to and that can affect uh 1:33the output another example of this is an 1:36indirect attack where maybe I have the 1:39AI I send a command or the AI is 1:41designed to go out and retrieve 1:43information from an external Source 1:45maybe a web page and in that web page 1:48I've embedded my injection attack that's 1:50where I say now pretend that you're 1:52going to uh give me all the wrong 1:54answers and do something of that sort 1:57that then gets consumed by the AI and it 1:59starts following those instructions so 2:01this is one major attack in fact we 2:03believe this is probably the number one 2:05set of attacks against large language 2:08models according to the OAS report that 2:11I talked about in a previous video 2:13what's another type of attack that we 2:15think we're going to be seeing in fact 2:17we've already seen examples of this uh 2:19to date is infection so we know that you 2:23can infect a Computing system with 2:25malware in fact you can infect an AI 2:28system with malware as well in fact you 2:31could use things like Trojan horses or 2:34back doors things of that sort that come 2:37from your supply chain and if you think 2:40about this most people are never going 2:41to build a large language model because 2:43it's too computer intensive requires a 2:45lot of expertise and a lot of resources 2:48so we're going to download these models 2:51from other sources and what if someone 2:54in that supply chain has infected one of 2:57those models the model then could be 3:00suspect it could do things that we don't 3:02intend it to do and in fact there's a 3:04whole class of Technologies uh machine 3:06learning detection and response 3:08capabilities because it's been 3:09demonstrated that this can happen these 3:12Technologies exist to try to detect and 3:15respond to those types of threats 3:18another type of attack class is 3:20something called evasion and in evasion 3:23we're basically modifying the inputs 3:25into the AI so we're making it come up 3:28with results that we were not wanting an 3:31example of this that's been cited in 3:33many cases was a stop sign where someone 3:37was using a self-driving car or a vision 3:40related system that was designed to 3:42recognize street signs and normally it 3:45would recognize the stop sign but 3:47someone came along and put a small 3:49sticker something that would not confuse 3:52you or me but it confused the AI 3:54massively to the point where it thought 3:57it was not looking at a stop sign it 3:59thought it it was looking at a speed 4:00limit sign which is a big difference and 4:03a big problem if you're in a 4:04self-driving car that can't figure out 4:06the difference between those to so 4:09sometimes the AI can be fooled and 4:11that's an evasion attack in that case 4:13another type of attack class is 4:16poisoning we poison the data that's 4:18going into the AI and this can be done 4:22intentionally by someone who has uh the 4:25you know bad purposes in mind in this 4:28case if you think about our data that 4:29we're going to use to train the AI we've 4:31got lots and lots of data and sometimes 4:34introducing just a small error small 4:37factual error into the data is all it 4:40takes in order to get bad results in 4:43fact there was one research study that 4:45came out and found that as little as 4:500.001% of error introduced in the 4:53training data for an AI was enough to 4:57cause results to be anomalous and be 4:59wrong 5:00another class of attack is what we refer 5:03to as extraction think about the AI 5:06system that we built and the valuable 5:09information that's in it so we've got in 5:12this system potentially intellectual 5:14property that's valuable to our 5:16organization we've got data that we may 5:18be used to train and tune the models 5:21that are in here we might have even 5:23built a model ourselves and all of these 5:26things we consider to be valuable assets 5:28to the organization 5:30so what if someone decided they just 5:32wanted to steal all of that stuff well 5:34one thing they could do is a set of 5:36extensive queries into the system so 5:38maybe I I ask it a little bit and I get 5:41a little bit of information I send 5:43another query I get a little more 5:44information and I keep getting more and 5:46more information if I do this enough and 5:49if I I fly sort of Slow and Low below 5:52radar no one sees that I've done this in 5:55enough time I've built my own database 5:58and I have B basically uh lifted your 6:01model and stolen your IP extracted it 6:04from your AI and the final class of 6:07attack that I want to discuss is denial 6:10of service this is basically just 6:13overwhelm the system I send too many 6:15requests there may be other types of 6:17this but the most basic version I just 6:19send too many requests into the system 6:21and the whole thing goes boom it cannot 6:24keep up and therefore it denies access 6:27to all the other legitimate users 6:30if you've watched some of my other 6:31videos you know I often refer to a thing 6:34that we call the CIA Triad it's 6:37confidentiality 6:39integrity and availability these are the 6:42focus areas that we have in cyber 6:44security we're trying to make sure that 6:46we keep this information that is 6:49sensitive available only to the people 6:52that are justified in having it and 6:54integrity that the data is true to 6:56itself it hasn't been tampered with and 6:58availability that the system still works 7:00when I need it to well in it security 7:03generally historically what we have 7:06mostly focused on is confidentiality and 7:09availability but there's an interesting 7:11thing to look at here if we look at 7:13these attacks confidentiality well 7:15that's definitely what the extraction 7:17attack is about and maybe it could be an 7:21infection attack if that infects and 7:22then pulls data out through a back door 7:26but then let's take a look at 7:27availability well that's basically this 7:29denial of service is an availability 7:31attack the others though this is an 7:34Integrity attack this could be an 7:36Integrity attack this is an Integrity 7:38attack this is an Integrity attack so 7:42you see what's happening here is in the 7:44era of AI Integrity attacks now become 7:48something we're going to have to focus a 7:49lot more on than we've been focusing on 7:51in the past so be 7:54aware now I hope you understand that AI 7:57is the new attack surface we need to be 7:59smart so that we can guard against these 8:02new threats and I'm going to recommend 8:05three things for you that you can do 8:07that will make you smarter about these 8:09attacks and by the way the links to all 8:11of these things are down in the 8:13description below so please make sure 8:14you check that out first of all a couple 8:17of videos I'll refer you to one that I 8:19did on securing AI business models and 8:22another on the xforce threat 8:24intelligence index report both of those 8:27should give you a better idea of what 8:28the threats look look like and in 8:30particular some of the things that you 8:31can do to guard against those threats 8:34the next thing download our guide to 8:38cyber security in the era of generative 8:41AI That's a free document that will also 8:43give you some additional insights and a 8:45point of view on how to think about 8:47these threats finally there's a tool 8:50that our research group has come out 8:52with that you can download for free and 8:54it's called the adversarial robustness 8:56toolkit and this thing will help you 8:59test your AI to see if it's susceptible 9:02to at least some of these attacks if you 9:04do all of these things you'll be able to 9:07move into this generative AI era in a 9:10much safer way and not let this be the 9:13expanding attack surface thanks for 9:16watching please remember to like this 9:18video And subscribe to this channel so 9:19we can continue to bring you content 9:21that matters to 9:25you