Learning Library

← Back to Library

CrowdStrike Patch Triggers Worldwide Outage

Key Points

  • The widespread reliance on a single security vendor (CrowdStrike) introduced a critical single point of failure, as their software is installed on countless enterprise machines worldwide.
  • A defective “Sy” content update from CrowdStrike unintentionally bricked every computer it touched, causing massive disruptions that grounded major U.S. airlines, halted airports across continents, crippled 911 systems in Illinois hospitals, and impeded health updates in Catalonia.
  • These incidents highlight how modern society’s dependence on interconnected computing infrastructure makes software bugs a matter of public safety, not just an IT inconvenience.
  • CEOs are often sold “one‑and‑one piece of mind” security promises without understanding that such solutions can create systemic risk, especially when the vendor’s product becomes a universal security layer.
  • The episode underscores the urgent need to redesign security response protocols and architecture to avoid single points of failure and to build more resilient, diversified defenses.

Full Transcript

# CrowdStrike Patch Triggers Worldwide Outage **Source:** [https://www.youtube.com/watch?v=DxH3jtsqbtg](https://www.youtube.com/watch?v=DxH3jtsqbtg) **Duration:** 00:08:52 ## Summary - The widespread reliance on a single security vendor (CrowdStrike) introduced a critical single point of failure, as their software is installed on countless enterprise machines worldwide. - A defective “Sy” content update from CrowdStrike unintentionally bricked every computer it touched, causing massive disruptions that grounded major U.S. airlines, halted airports across continents, crippled 911 systems in Illinois hospitals, and impeded health updates in Catalonia. - These incidents highlight how modern society’s dependence on interconnected computing infrastructure makes software bugs a matter of public safety, not just an IT inconvenience. - CEOs are often sold “one‑and‑one piece of mind” security promises without understanding that such solutions can create systemic risk, especially when the vendor’s product becomes a universal security layer. - The episode underscores the urgent need to redesign security response protocols and architecture to avoid single points of failure and to build more resilient, diversified defenses. ## Sections - [00:00:00](https://www.youtube.com/watch?v=DxH3jtsqbtg&t=0s) **CrowdStrike Update Cripples Global Systems** - A speaker warns that a flawed CrowdStrike software update disabled countless computers, halting airlines, hospitals, and emergency services worldwide, highlighting systemic network vulnerabilities. ## Full Transcript
0:01I want to talk about the inherent 0:03vulnerabilities of the way we've 0:05constructed our computer networks today 0:07it's a nerdy topic but you all are going 0:08to care about it because the world just 0:11broke because of this issue so crowd 0:15strike is a global security company and 0:18cyber security in particular and their 0:21whole mission statement their whole goal 0:23is to protect the fleets of computers 0:26that they are installed on by making 0:28sure that malware doesn't run on them 0:30that hackers can't get access to them uh 0:33they also do monitoring of employees 0:35which is a lot more 0:36controversial and I just want to take a 0:39minute for the fact that it's this 0:41company it is this company that released 0:44a Sy content 0:47update that then 0:50bricked every computer they're installed 0:52on and that is a massive number 0:54worldwide they are a big company they're 0:56$83 billion company traded on the stock 0:58market or they were uh 1:00and they released a Content update that 1:03was 1:04so 1:07defective Airlines ground to Halt 1:10hospitals were affected in the 1:12Netherlands all three major US airlines 1:14for the first time that I think I can 1:16recall were grounded by a tech error at 1:19the same time so American Delta United 1:21all down airports from Zurich to 1:24Melbourne to Amsterdam were all 1:27down hospitals in Illinois were having 1:31trouble because their 911 system was 1:32down Catalonia was having trouble 1:34getting Health updates so these are 1:35actually serious issues right like on 1:37the one hand like it's easy to chuckle 1:39and say wow they really messed up but 1:41people's lives are affected because we 1:43all have come to depend on Computing so 1:45much and that's kind of what I want to 1:47talk about here we have an opportunity 1:51to think about the way we architect our 1:54security response protocols the way we 1:56architect our expectations for security 2:00fixes because at the end of the day what 2:02crowd strike sold was a oneandone piece 2:07of Mind solution to chief executive 2:10officers hey I know that you worry at 2:13night about being on the front page of 2:15the newspaper tomorrow because of some 2:16sort of Cyber attack we will protect you 2:20from this trust us we're the ones that 2:22have been you know 2:23investigating uh North Korean hacking 2:26attempts for the last decade were the 2:28ones that have been called in by the US 2:30government to investigate in other 2:31situations don't worry we're the ones 2:34who are responsible here right like 2:35we're the we're the grown-ups and we can 2:37protect you and if you're a CEO and you 2:40don't know a 2:41ton about the details of the Tech you 2:43it's an easy line to buy right it sounds 2:45like it's responsible they're a publicly 2:47traded company so you invest and so many 2:49of our IT solutions are purchased on 2:51that 2:52basis and then they come and install 2:54this client software on all of these 2:56machines 2:58and you you've now introduced a common 3:01point of failure across all of your 3:03machines in the name of security and 3:05therefore your entire ability to sustain 3:08your business is indirectly linked to 3:11this security company and you've done it 3:13because you're so worried about cyber 3:14attacks but if you think about it what 3:17just happened with this blue screen of 3:18death affecting hundreds of millions of 3:20computers 3:22worldwide was worse than a most cyber 3:24attacks it was worse this is what Y2K 3:28was supposed to be like I remember back 3:30in 1999 at midnight on December 31st 3:33everyone thought this is what was going 3:34to happen all of our machines were going 3:36to turn to the blue screen of death and 3:38we weren't able to get anything done and 3:39because of heroic efforts from a lot of 3:42intrepid programmers that did not happen 3:44but that's because we saw it coming and 3:46you can't see a defective bug in a 3:51Content file coming it's just human 3:53people are going to make mistakes and 3:55the problem is we built a system where a 3:57single human I guarantee it's one 3:59engineer somewhere can make a mistake 4:02and shut down hundreds of millions of 4:04machines 4:06worldwide and there will be a lot of 4:08CEOs that are thinking about their 4:09security deployments as a result because 4:12this is a terrifying amount of corporate 4:16vulnerability one of the other things 4:18that I want to call out is that at the 4:21end of the day when these things happen 4:24or if these things happen we also have 4:27to get better at how we handle the the 4:29systemic implications of distributed 4:31systems because what they're having to 4:34do they can't push the update the 4:36machines won't take it they're having to 4:39tell it managers responsible for these 4:41individual fleets to go through a 4:43four-step process that essentially 4:45involves 4:46reinstalling the channel file that had 4:50the bad content update this Sy file and 4:53so all of these it managers individually 4:55around the world who are responsible for 4:57fleets of machines are having to go 4:59through the same four-step process to 5:01fix their machines 5:02and I love the it departments I've 5:05worked with they're absolute Heroes 5:07they're having an awful day 5:09today but it's also not fair to them to 5:13impose a globally distributed system and 5:17then force them to be the local front 5:19lines for that system when something 5:21breaks like that is fundamentally not 5:23the way the system should be 5:25designed and we need to revisit our 5:29assumption around security 5:31architecture we need to think about what 5:33it takes to have a more resilient system 5:35where a Content file like that could be 5:37fixed from a central point where a 5:40Content file like that could be fixed 5:43before it reaches 5:45deployment and that's a different 5:47conversation I don't want to get too far 5:49into that today but from a technical 5:52perspective the first thing I think 5:54about is how you prevent these things in 5:55future and I think it's about systems 5:57design I don't think that we can fix it 5:59by saying we're going to have good 6:01intentions and we're just not going to 6:02do it again which is basically what Crow 6:04Drake is is saying it's like well this 6:06is this is you know this was a normal 6:08update it was just one little aberration 6:10easy fix available off you go well it's 6:13kind of underselling the impact of what 6:15they did to the world 6:16today 6:18and we owe it to ourselves as people who 6:22work in Tech to think about and design 6:25our system so that they are more 6:26resilient and less susceptible 6:30to the kinds 6:32of globally systemic crashes that could 6:35be attributable to a single person 6:38making 6:40mistake it's it's just not sustainable 6:42for us because we're human we're going 6:43to make 6:45mistakes and if you think the answer is 6:48AI or llm generated code that's really 6:51buggy too that's that's not the answer 6:53here uh I've messed around with LM 6:56generated code and it it produces bugs 6:58so much 7:00uh so it's a little aside but the point 7:02is there's not an easy fix with AI there 7:04there so rarely is a magic one fix with 7:06AI and this is another case where that's 7:08not not true we actually have to think 7:10about the systems architecture of what 7:12we're deploying and we have to think 7:13about our business models and our 7:15expectations like as leaders in 7:18corporations we need to be insistent 7:21that we do a degree of due diligence on 7:24the solutions we deploy not just in 7:25terms of whether the company is 7:27professional but whether the company is 7:30able to design systems that are actually 7:33resilient and that level of due 7:35diligence is rarely done so much due 7:37diligence I see basically amounts to is 7:39this a legitimate company do they have 7:40lots of policies written I guarantee you 7:43crowd strike had lots of policies that 7:44did not save 7:46them they actually needed to build 7:49resilient Technical Solutions and nobody 7:51apparently had been checking them that 7:53they would do that and that's really the 7:55core issue you shouldn't be buying 8:00a client software that installs on all 8:03of your 8:05machines if you don't have confidence 8:08that they're going to be able to 8:11actually build a resilient system and I 8:13realize that's an easy thing to say and 8:15a hard thing to do because you have to 8:16buy your software from somewhere you 8:18can't just go and not have cyber 8:20security software if you have a 8:21significant Fleet of computers you need 8:23it somewhere there's relatively few 8:25providers and so it's not it's not 8:28necessarily an easy thing to go out and 8:30change tomorrow it's something that the 8:32whole Community has to come together and 8:33think about and I guess that's my 8:35challenge how do we think about building 8:38more resilient systems so that's where 8:39I'll leave it uh I hope you get over 8:42your blue screen of death soon if that's 8:44you um and if not spare a thought for 8:46the it managers who are absolutely 8:49suffering today