Deepfake Audio Threats Explained
Key Points
- Jeff demonstrates a voice deepfake created by an AI tool that can mimic his speech after only a short audio sample.
- Modern deepfake technology can generate realistic audio and video from as little as three seconds of input, making convincing fakes increasingly easy to produce.
- These fakes pose significant financial risks, enabling scams such as “grandparent” fraud and large corporate frauds that have resulted in multi‑million‑dollar losses.
- Awareness and verification (e.g., confirming identities through independent channels) are essential defenses against deepfake‑enabled deception.
Sections
- Untitled Section
- Deepfake Threats: Fraud & Disinformation - The passage explains how convincing deepfake videos and audio can facilitate financial scams and spread false political messages, creating severe economic, electoral, and geopolitical consequences.
- Deepfakes Undermine Courtroom Evidence - The speaker warns that deepfake videos could cause wrongful convictions, current detection technologies are unreliable, and the legal system is unprepared to address these challenges.
- Challenges of Universal Deepfake Verification - The speaker argues that requiring every audio/video app to embed a deep‑fake verification label is technically daunting and ultimately ineffective because compliance is limited to good actors while bad actors will simply ignore the rules.
- Out‑of‑Band Verification Strategies - The speaker explains using alternative communication channels, third‑party confirmation, and pre‑shared secret code words to authenticate high‑risk interactions and guard against deepfake fraud.
Full Transcript
# Deepfake Audio Threats Explained **Source:** [https://www.youtube.com/watch?v=cVvJgdm19Ak](https://www.youtube.com/watch?v=cVvJgdm19Ak) **Duration:** 00:14:33 ## Summary - Jeff demonstrates a voice deepfake created by an AI tool that can mimic his speech after only a short audio sample. - Modern deepfake technology can generate realistic audio and video from as little as three seconds of input, making convincing fakes increasingly easy to produce. - These fakes pose significant financial risks, enabling scams such as “grandparent” fraud and large corporate frauds that have resulted in multi‑million‑dollar losses. - Awareness and verification (e.g., confirming identities through independent channels) are essential defenses against deepfake‑enabled deception. ## Sections - [00:00:00](https://www.youtube.com/watch?v=cVvJgdm19Ak&t=0s) **Untitled Section** - - [00:03:04](https://www.youtube.com/watch?v=cVvJgdm19Ak&t=184s) **Deepfake Threats: Fraud & Disinformation** - The passage explains how convincing deepfake videos and audio can facilitate financial scams and spread false political messages, creating severe economic, electoral, and geopolitical consequences. - [00:06:06](https://www.youtube.com/watch?v=cVvJgdm19Ak&t=366s) **Deepfakes Undermine Courtroom Evidence** - The speaker warns that deepfake videos could cause wrongful convictions, current detection technologies are unreliable, and the legal system is unprepared to address these challenges. - [00:09:11](https://www.youtube.com/watch?v=cVvJgdm19Ak&t=551s) **Challenges of Universal Deepfake Verification** - The speaker argues that requiring every audio/video app to embed a deep‑fake verification label is technically daunting and ultimately ineffective because compliance is limited to good actors while bad actors will simply ignore the rules. - [00:12:21](https://www.youtube.com/watch?v=cVvJgdm19Ak&t=741s) **Out‑of‑Band Verification Strategies** - The speaker explains using alternative communication channels, third‑party confirmation, and pre‑shared secret code words to authenticate high‑risk interactions and guard against deepfake fraud. ## Full Transcript
Hi, this is Jeff and you are listening to a deepfake of my voice.
This is not a recording.
I never actually said these words.
This was all generated by an AI tool trained on audio samples of my voice.
Many of you have the same technology in your pocket right now and don't even know it.
In fact, it's included in a popular mobile phone operating system
that you may use every day.
In this video, we are going to talk about what deepfakes are,
what risks they pose, and how we can defend against them.
So that was deepfake Jeff.
This is the real Jeff or is it really?
Maybe this is a deepfake who just played a deepfake for you.
I'll let you chase that recursion as long as you like and work on that on your own.
Okay, let's see how these deepfakes actually work, what they are, how you build one?
Well, you start off with an actual human being.
So in the case of the deepfake that you heard me generate,
I started off with me talking into my phone,
speaking a set of of words that it told me I needed to say.
It then listens to all of that and builds a model of my speech
so it can do that after I've read all of that sample text.
But there are some models that can do this with as little as three seconds of audio sample from an individual,
so it's not all that hard to make very convincing deepfakes these days.
Then, once you've built the model, what you do is you type in to the system whatever you want it to say.
So if I type in, say this and I enter that text into the deepfake generator,
then it will generate a sound that sounds just like that person or very similar to them.
And we can do this with audio. We can do this with video.
The video will actually show the mannerisms of the person and what they look like as well.
These things can be very convincing and this technology is only getting better.
Okay, let's take a look at some of the risks now that you know how a deepfake can be generated.
How could someone use this to do bad things?
Well, one type of risk
classification of these would be a financial risk of some sort, a fraud.
One of these things is also often referred to as a grandparent scam,
because they are frequently the targets of these,
although it really could happen to any family member or anyone that you know.
In fact, the way these things work is
you get a deepfake of someone's voice, let's say a grandchild,
and then you have them call, the deepfake makes the call
and talks to the grandparent and tells them, help, I'm in trouble. I wrecked my car,
I got robbed, I've been arrested. Something like that.
And I need you to send money. Please help.
And what grandparent isn't going to help their grandchild?
So they send money.
But of course, they're not sending money to who they think,
they're sending it to the bad guy.
Another case, even very sophisticated organizations can fall for these kinds of scams and that's corporations.
There was one organization that wired $35 million
to a scammer based upon a deep fake phone call.
Another organization did 25 million based upon a deep fake video call,
where the person on the video was claiming to be the chief financial officer of the company,
and it was convincing enough to make someone send the money, in following those instructions.
So this can happen to a lot of folks, and, bad stuff happens when it does.
What's another risk that can happen here? Well, how about disinformation?
In the case of disinformation, this could have a lot of national and political side effects as well.
There was one case recently in, the US presidential election, the lead up,
where a robocall was calling people in a particular state,
telling them they didn't need to go out and vote,
that, in fact, they could just save their vote for the general election because this was just a primary.
In fact, it was a robocall, and the robocall was in the voice of the president of the United States, which was a recognizable voice.
And people thought they were hearing the voice of the president, and they weren't.
Imagine if someone took that technology further still and used it to create some really damaging fake news of some sort.
Maybe you have a head of state who appears to be on video
declaring war on another country,
or the head of a company, then saying, you know, the drugs that we manufacture,
they kill half the people that take them.
Even though it's not true, and even though the CEO never said that,
it's going to cause the stock price to plummet.
And if someone knows that when that's going to be released,
they would know to buy shorts.
That is a bet that a stock will go down and they can profit from that.
So disinformation campaigns would be very damaging in a lot of cases.
And then one other case, and there is a lot more than the ones that I'm just mentioning here.
But one other possibility is an extortion attack where someone is trying to extort money from you.
They say I've got compromising photos.
I've got an audio of you saying something that you never said, or video of you doing something you never did.
But it's not something that you would want anyone to know about, because it's a damage to your reputation.
And this type of reputational, a threat could be enough to cause someone to pay real money just in order to keep this away,
because it will be very difficult to detect whether it's real or not.
So think about it this way. We have a threat because of the mere existence of deepfakes.
These things create a lot of uncertainty.
So we will have if we consider in the world of possibilities, we have false negatives and false positives.
So a false positive is if we identified a deepfake and it really wasn't a deepfake.
So imagine if, if you were a juror in a trial
and someone shows you, the prosecution shows you a video
of someone going into a bank with a gun and then walking out with money,
and they show you that and they say, "will you convict?"
You'll say, "yes, I just saw the video".
But what if it wasn't a video?
What if it was a deepfake?
Now you might convict someone who was actually innocent on the other side.
It could be that it wasn't actual video, but that the prosecutor showed you.
But the defense just has to argue. No, we think that was a deepfake.
So the mere presence of a deepfake causes doubt.
And doubt is, of course, something that will hang a jury.
So this is a technology that we're going to have to struggle with,
and we're not really fully prepared yet, I think, to understand all of those implications.
Okay, now, we've talked about what deepfakes are and how you can generate them.
We've talked about what the risks are. What are the downsides that can happen here.
Now what are you supposed to do about it?
What kind of defense can we have?
There are some things that I think work and some things that I think really don't work.
And the number one in that category of things that don't work
is the one that most people, especially technology oriented people, jump to.
And that is, let's use technology that created this problem to solve the problem for us.
Let's have some software that's able to detect
the difference between a deepfake and the real thing.
Well, that sounds like a good idea, but in practice it hasn't worked out so well.
In fact, NPR did an investigative report
where they looked at some of these deepfake detection tools.
One of those tools actually did no better than 50/50,
50% accuracy with its deepfake detection.
Well, you know, I don't need to buy one of those tools.
I got a deepfake detector that will give me 50/50 accuracy with this.
All I have to do is that, and yeah, deepfake.
So obviously that's not going to be really very accurate.
That's not something we can really count on.
The reason for this is that if I start looking at the technology itself,
the deepfake detectors have continued to get better,
and I suspect they will continue to get better.
What's the problem?
The deepfakes themselves are getting way better and much faster.
So that means that we quickly reach an inflection point
where the detection technology just isn't keeping up.
And you can see, at least in the case of that, we're already there.
So that I think, is going to be, a losing battle,
because if you think about how good the deepfakes will get,
at one point, they will be indistinguishable from an authentic video.
So what's something else?
This is the other area a lot of technology people look to,
and that is some sort of authentication scheme
where I'm going to be able to tell because at the time that the video is recorded,
it will include some sort of label, some sort of marking.
So maybe a digital watermark that you don't see,
but that software that plays it will be able to look for the presence of,
and then tell you, "this is a deepfake", "this is not a deepfake".
Sounds like a good idea.
But first of all, there's no standard for doing this,
no industry standard, no common way that everyone agrees this is how we're going to do it.
So we have to create that first and it doesn't exist.
Secondly, I would need to have some sort of verification capability
that would be part of the standard,
and it would need to go into every single piece of software
that ever does rendering of audio and video.
That would be a lot of work.
Think about all the apps on your phone,
all of the different websites and things like that that might do audio or do video.
All of those would have to be written to look for this deepfake label
and be able to render that and tell you.
But I'll tell you, even if we got all of that part solved
and everyone who did a recording was able to label it as deepfake or not,
which again, is a monumental "if" to begin with.
The other issue is one of compliance.
That is, think about we set up a system of rules.
Who follows rules and who doesn't?
Who's not going to label their deep fakes when they are?
Who's not going to follow the rules?
It's the bad guys.
So in other words, we'll create a system that pretty much is followed by the good guys
who are not really a threat to begin with,
and the bad guys won't follow it.
And every time we get a video that's not marked, you will not know.
And that's where we are right now.
So it really it's a lot of work to put us kind of almost back to where we start.
Now, maybe there will be some other technological advances that I can't foresee,
but that's my take on it.
Now, what does work?
Okay, I think number one,
I was asked at a security conference one time,
"if you were the head of the FTC", the Federal Trade Commission in the US that looks over fraud and things like that.
"What would you do if you had only one thing you can do?"
And I told them this, and I still believe it.
It's education.
I would want to run some sort of campaign
to let people know what these deep fakes are.
What is the art of the possible?
What are the risks that go along with this?
So that they wouldn't understand and be on the lookout for these things.
Because I can tell you, most people have no idea how good this technology is already.
Now, you heard the deepfake of my voice and it sounded a little deep fakey.
It sounded a little depressed, a little, you know, lacking in emotion and things like that.
So you might have detected that that was not really my voice.
However, these technologies are a lot better than that.
That's just one of the common ones that everyone has access to.
And the one thing I'm sure of, again, the deepfake technology will keep improving.
What's the other thing I'm trying to do?
I'm trying to create a certain level of healthy skepticism.
Now I'm a security guy so we can find the dark cloud in every silver lining.
So we're skeptics by nature.
You don't want to be overly negative and overly skeptical,
but there's a certain level of healthy skepticism that is going to be necessary.
We all need to be skeptics to one degree or another.
If you weren't in the room when you heard it or saw it,
maybe you didn't hear it and see it.
Maybe what you heard and saw was a deepfake.
If you're watching a show for entertainment, doesn't really matter.
But if you're about to wire $25 million or even your life savings, then it matters.
So in those cases where the stakes are high,
then we should be relying on other mechanisms like out-of-band communications.
Out-of-band means if I got a phone call and I hear your voice on it,
then I'm going to hang up and then call you back
at another number that I know you're supposed to answer at.
Maybe even I call a family member or a friend of yours
to verify that the story holds up.
You know, are they really in that other country that they claim they're in?
Because I didn't know they were supposed to be there.
So that's one way.
Also, using other means.
If I got a voice call, maybe I send an email.
Maybe I do it from a different device even.
We can talk a little bit more about what some of these options are
in another video that I'll point you to at the end.
And another thing that that helps in this case,
a lot of people have have really tried to get smart about this.
And they'll say, well, in that grandparent scheme you mentioned,
what if in advance we agreed on a code word?
Is that going to work?
So in other words, I tell all my family members, if I ever call you asking for money,
ask me what the code word is, and if I don't know it, then it's probably a deepfake.
And because the deepfake generator can generate my voice,
but it doesn't know all the things that I know.
So this sort of secret knowledge pre-shared in advance
would be the way to tell if I trust it or not.
I put a question mark on this for a reason though.
There is a special type of attack,
I'll make reference to where you can find out more about it, that defeats even that.
Deepfakes represent an escalation in the cyber arms race.
The bad guys keep getting more and better tools.
That means we're going to have to keep getting smarter in order to defend against it.
So that was the purpose of this video to make you aware of what deepfakes are,
what some of the risks are,
and warn you in terms of what kinds of capabilities you might use
in order to defeat these things and detect them when they happen.
Take a look also at the video I did on audio jacking,
so that you'll understand why code words may not be the panacea that you had hoped they'd be.
They say that for warned is forearmed.
Now you should consider yourself to be both.