Claude AI Hijacked for Chinese Espionage
Key Points
- In mid‑September, Anthropic discovered that a Chinese state‑sponsored group (GTGU) had jail‑broken Claude’s code and integrated it via the MCP protocol into an automated hacking framework that performed 80‑90% of a large‑scale espionage campaign against roughly 30 high‑value targets.
- The AI‑driven operation handled reconnaissance, exploit development, credential harvesting, lateral movement, and data exfiltration at machine speed, with human intervention limited to only a few decision points per target.
- This incident marks the first documented case where an LLM served as the primary cyber‑attack agent, signaling a shift from AI‑assisted human hackers to AI‑controlled offensive operations.
- The successful use of Claude dramatically lowers the barrier to sophisticated cyber‑espionage, allowing state actors—and eventually less‑resourced groups—to launch complex campaigns without large elite red‑team resources.
- The attack demonstrated that platform safety mechanisms can be circumvented by fragmenting malicious tasks into seemingly benign requests, highlighting AI safety as a critical systemic risk for future cybersecurity defenses.
Sections
- Anthropic AI Powers Chinese Cyber Espionage - The transcript details how a Chinese state‑backed group hijacked Anthropic’s Claude code to automate a large‑scale espionage campaign, with AI handling most of the hacking tasks and prompting industry and Anthropic reactions.
- Evolving AI Threat Model Debate - The speaker outlines the security community’s split over a newly revealed AI exploit—praising Anthropic’s detection work yet condemning its preventive gaps—and urges that AI product design now assume malicious actors and implement system‑level, telemetry‑driven defenses.
- AI Fluency Redefines Cyber Defense - The speaker stresses that modern security teams must master AI tools for rapid threat analysis and response, as attackers are already leveraging AI-driven red‑team frameworks and exploit kits, making AI competence essential for defense, compliance, and future resilience.
Full Transcript
# Claude AI Hijacked for Chinese Espionage **Source:** [https://www.youtube.com/watch?v=7Kc9BNEe2mk](https://www.youtube.com/watch?v=7Kc9BNEe2mk) **Duration:** 00:09:48 ## Summary - In mid‑September, Anthropic discovered that a Chinese state‑sponsored group (GTGU) had jail‑broken Claude’s code and integrated it via the MCP protocol into an automated hacking framework that performed 80‑90% of a large‑scale espionage campaign against roughly 30 high‑value targets. - The AI‑driven operation handled reconnaissance, exploit development, credential harvesting, lateral movement, and data exfiltration at machine speed, with human intervention limited to only a few decision points per target. - This incident marks the first documented case where an LLM served as the primary cyber‑attack agent, signaling a shift from AI‑assisted human hackers to AI‑controlled offensive operations. - The successful use of Claude dramatically lowers the barrier to sophisticated cyber‑espionage, allowing state actors—and eventually less‑resourced groups—to launch complex campaigns without large elite red‑team resources. - The attack demonstrated that platform safety mechanisms can be circumvented by fragmenting malicious tasks into seemingly benign requests, highlighting AI safety as a critical systemic risk for future cybersecurity defenses. ## Sections - [00:00:00](https://www.youtube.com/watch?v=7Kc9BNEe2mk&t=0s) **Anthropic AI Powers Chinese Cyber Espionage** - The transcript details how a Chinese state‑backed group hijacked Anthropic’s Claude code to automate a large‑scale espionage campaign, with AI handling most of the hacking tasks and prompting industry and Anthropic reactions. - [00:03:34](https://www.youtube.com/watch?v=7Kc9BNEe2mk&t=214s) **Evolving AI Threat Model Debate** - The speaker outlines the security community’s split over a newly revealed AI exploit—praising Anthropic’s detection work yet condemning its preventive gaps—and urges that AI product design now assume malicious actors and implement system‑level, telemetry‑driven defenses. - [00:06:46](https://www.youtube.com/watch?v=7Kc9BNEe2mk&t=406s) **AI Fluency Redefines Cyber Defense** - The speaker stresses that modern security teams must master AI tools for rapid threat analysis and response, as attackers are already leveraging AI-driven red‑team frameworks and exploit kits, making AI competence essential for defense, compliance, and future resilience. ## Full Transcript
News broke today, November 13th, that
Anthropic has successfully repelled a
Chinese state sponsored attack employing
Claude as an agent. This is the first
documented case we have where Claude
code was used as an agent to conduct a
cyber attack. This is a big enough deal
that I'm going to go through exactly
what happened, why it matters, what
Anthropic's take is, what the cyber
security industry's take is, and
ultimately what are the takeaways for
all of us as we build with these
systems. First, what happened? In
midepptember, Anthropic detected a
sophisticated espionage campaign that
they attribute with fairly high
confidence to a Chinese state sponsored
group, namely GTGU.
The attackers jailbroke Claude code and
used it as the core engine of an
automated hacking framework. So Claude
was wired into tools via the MCP
protocol to do recon, to write and run
exploit code, to harvest credentials,
and ultimately to exfiltrate data.
Around 30 high-value targets were hit.
Most of them were big tech, financial
institutions, chemical manufacturers,
and government agencies. A small number
of them had confirmed successful
breaches. And if you're wondering, no,
nobody is saying which they were.
Anthropic says AI performed 80 to 90% of
the campaign's work. With humans
stepping in at only four to six key
decision points per target, the system
fired off thousands of requests per
second, well beyond what a human team
could have sustained. This is likely the
first documented large-scale cyber
espionage campaign where an AI agent
framework, not humans, did most of the
tactical work. We have been dreading
this moment and it is here. So why does
this matter? We have crossed the Rubicon
from helpful co-pilot to operational
cyber agent. It shows that current
generation models and tools are already
capable of running real world offensive
operations endto end including recon,
including vulnerability discovery,
including prioritization of targets,
including exploit generation, including
lateral movement, including data triage.
That is a massive qualitative shift even
from the summer when AI helps a human
hacker was the prevalent model. Now AI
is the primary operator. The second big
takeaway is that the barrier to
sophisticated attacks has fallen through
the floor. You no longer need a big
elite red team to run complicated
campaigns. A capable state actor can
frontload the strategy and let an AI
framework just grind through all of that
tactical work at machine speed, which is
lightning fast. Over time, these
frameworks will trickle down to less
resourced groups. One of the truisms
about AI is that it is impossible to
contain. It proliferates. This is
something that other people will copy.
Number three, platform safety is now a
core systemic risk. The attackers did
not turn off Claude code safety. They
worked around it. They broke the
operation into small innocentlooking
tasks for Claude code. They told Claw
that it was doing legitimate security
testing. They hid malicious intent
inside the orchestration layer, not in
any given prompt. And that's a reminder
that prompt level guardrails alone are
very brittle and they are not enough
once you have agents and tools. If you
are building for agentic systems, you
have to think in terms of the
orchestration layer. Number four,
Anthropic is trying to frame this as
proof of defensive value and critics are
seeing proof of platform failure.
There's a lot of divide in the security
community about this particular exploit
now that it's been public. We will see
in the coming days where the consensus
emerges. Anthropics line is pretty
simple. The same capabilities that
enabled the attack also helped their
threat intelligence team to detect the
attack to analyze the attack and
ultimately to harden their classifiers
and detection systems to make that kind
of attack pathway more difficult in the
future. On the other side, early
security chatter is calling this a basic
failure to prevent obvious abuse
patterns in the first place. The
challenge here is that you sort of have
to hold both ideas as potentially true.
Dual use is going to be a real threat
for agents even if they have a ethical
core as anthropic likes to claim Claude
does. And we caught it does not erase
the responsibility to design systems
that are harder to weaponize at all. And
I think that there is work to be done
here. And I think Anthropic doesn't yet
have an answer for it. And frankly, I
don't think anybody has an answer for
it. So what can we learn? Number one,
the threat model for AI products has
changed. If you're building aic systems,
the correct assumption now is given
enough time, someone will try to turn
this into an attack framework. You must
assume that assume malicious actors.
That means you need system level
defenses, not just nice sounding usage
policies, right? That means you're going
to have to have telemetry that detects
rate patterns, that detects to tool call
graphs that are suspicious. You're going
to have to detect targets. You're going
to have to detect code execution
profiles. There's a lot of stuff that
you are going to have to do to detect
actual behavioral usage of your agentic
tool. You also need to have a least
privilege basis for agents. Don't let a
generic assistant use a root capable
network scanner with free access to just
go to town. Right? And I think that
sometimes in these early days, we have
been tempted sort of the wild west of
agents. Give the agents root access, see
what they can code. Oh my gosh, they're
coding so fast. Those days are coming to
a close. You need to get into a world
where you assume that the agent may be
contaminated and you give at least
privilege as a priority. You also need
to assume that high-risk actions are
going to be gated by humans. This is
back to the idea that part of humans
role in the age of AI is to be a
liability gate. We need to have humans
that are responsible for the explicit
approval required for high value actions
like mass scanning or credential dumping
or data exfiltration. There should be
hard guard rails and hard internal
workflows that prevent any automated
action against that kind of workflow.
Number two, I I'll emphasize it again.
guard rails that only live in the model
are not enough anymore. The campaign
worked by context splitting. It fed
Claude many tiny ostensibly benign
tasks. It never revealed the full attack
chain. So Claude never saw it. That
means as I emphasized safety must run at
the orchestration layer. You have to
have safety at the orchestration and
tool layers that can say what hosts are
being hit, what ports over what time
window, how many credentials are being
touched, what about tenants. Policy
needs to think about patterns of
behavior, not just strings and prompts.
This is the same design problem that we
have for helpful enterprise agents, but
we now have to flip the script and think
about malicious agents. Takeaway number
three, defense now requires AI fluency,
not just controls. So, Anthropic's own
team did lean on Claude to sift through
the mountain of telemetry and evidence
from the incident, and they credit
Claude with their ability to respond
swiftly and accurately. I think that's
correct. For any serious security org,
there is a new normal here. Analysts
need to be able to use AI to correlate
indicators of compromise, to cluster up
related events, to summarize complicated
timelines. SOC playbooks get rewritten
and should focus around humans
supervising AIdriven triage and hunting,
not humans doing all of it by hand. And
so the SOCK 2 assumptions that we
typically have are not necessarily going
to play out in the same way in the new
world we just entered today. If your
security team is debating whether they
can trust AI, they are behind what the
attackers already do. So what's coming
next? One, AI red team in a box is
coming next. expect that you're going to
get turnkey attack frameworks that sit
on top of any sufficiently capable model
and it will widen the pool of threat
actors dramatically. There will be a
shadow market of AI compatible exploit
kits that is widely traded. The bad guys
are going to make life really miserable
for us unless we're careful here because
this is just going to proliferate.
Number two, compliance and buyer
pressure are going to move way faster
than the law in this regard. Large
customers will demand agent vendors have
clear misuse detection guarantees, that
they have clear audit logs, that they
have documented kill switches, that they
have rate limit strategies, they have
regional sectorbased safety policies.
This is the early days of SOCK 2 for
agents, and no one has written the
playbook. And I think enterprise
customers are going to be the ones
demanding that playbook from
modelmakers. Internally, if you're a
CISO or a CTO, you have to do three hard
things today. You have to put the AI
into the sock stack instead of treating
it as a side experiment. You have to
think of it in terms of triage detection
and response. You have to explicitly
test your own agentic systems as if
they're an attack surface via red
teaming. And you have to treat MCP and
tools, not just the model as part of the
security perimeter. So don't think about
hardening the model per se. Think about
the entire security perimeter
encompassing the agent, the tools they
use, the orchestration layer. If you're
a builder, if you're a PM, the real
takeaway is assume your product may sit
on both sides of the chessboard. It may
be something defenders will use and the
attackers may use it as well. You need
to be thinking about observability,
about abuse detection, about controls as
first class features, not bolt-ons. If
you are competing on raw model power,
that is a race to the bottom. But if
you're competing on trustworthy,
controllable, observable, agentic
systems, that may become a durable edge
because what is in jeopardy right now is
trust. If you'd like to read more, I put
more on the Substack here. This is a
really, really important topic and I
think we need to be talking about it
more. This will not unfortunately be the
last time that we have this kind of a
threat and we need to build for it