Skip to main content
Blog

What to Know About the Claude AI Breach

Would you like to learn more?

Download our Pentest Sourcing Guide to learn everything you need to know to successfully plan, scope, and execute your penetration testing projects.

The makers of artificial intelligence (AI) chatbot Claude claim to have caught hackers sponsored by the Chinese government using the tool to perform automated cyberattacks against around 30 global organizations.

Anthropic said hackers tricked the chatbot into carrying out automated tasks under the guise of carrying out cybersecurity research.

The company claimed in a blog post that this was the "first reported AI-orchestrated cyber espionage campaign."

But skeptics are questioning the accuracy of that claim, alongside the motive behind it.

Anthropothic AI Cyberattack: An Overview

Anthropic said it discovered the hacking attempts in mid-September.

Pretending they were legitimate cybersecurity workers, hackers gave the chatbot small automated tasks which, when strung together, formed a "highly sophisticated espionage campaign". Researchers at Anthropic said they had "high confidence" the people carrying out the attacks were "a Chinese state-sponsored group".

They said humans chose the targets (namely large tech companies, financial institutions, chemical manufacturing companies, and government agencies.) From there, threat actors then built an unspecified program using Claude's coding assistance to "autonomously compromise a chosen target with little human involvement".

Anthropic claims the chatbot was able to successfully breach various unnamed organisations, extract sensitive data and sort through it for valuable information.

The company said it had since banned the hackers from using the chatbot and had notified affected companies and law enforcement.

How Was Claude AI Jailbroken?

Further probing revealed the attackers had to jailbreak Claude, tricking the AI into bypassing its built-in safety rules. They did this by presenting the malicious tasks as routine, defensive cybersecurity work for a made-up, legitimate company. By breaking the larger attack into smaller, less suspicious steps, the hackers managed to avoid setting off the AI’s security alarms.

Once it was tricked, Claude worked on its own to examine target systems, look for valuable databases, and even write its own unique code for the break-in. It then stole usernames and passwords to get access to sensitive data. The AI even created detailed reports afterwards, listing the credentials it used and the systems it had breached.

What is the AI-Breach Connection?

Some information technology companies have been criticized for "over-emphasizing" cases where AI was used by hackers.

Critics say the technology is still too unwieldy to be used for 100% automated cyberattacks. For example, in November, cyber experts at Google released a research paper that highlighted growing concerns about AI being used by hackers to create brand new forms of malicious software.

But the paper concluded the tools were not all that successful and were only in a testing phase. Anthropic itself admitted its chatbot made mistakes, such as making up fake login usernames and passwords (and claiming to have extracted secret information which was in fact publicly available.)

"This remains an obstacle to fully autonomous cyberattacks," Anthropic said.

AI Hackers: A History

Anthropic's announcement is perhaps the most high-profile example of companies claiming bad actors are using AI tools to carry out automated hacks.It is the kind of danger many have been worried about, but other AI companies have also claimed that nation-state hackers have used their products.

In February 2024, OpenAI published a blog post in collaboration with cyber experts from Microsoft saying it had disrupted five state-affiliated actors, including some from China.

"These actors generally sought to use OpenAI services for querying open-source information, translating, finding coding errors, and running basic coding tasks," the firm said at the time. Anthropic has not said how it concluded the hackers in this campaign were linked to the Chinese government.

Conclusion

Anthropic says it’s built more safeguards to flag and stop abuse of Claude Code.

At the end of the day, the operational, social, and even existential stakes for “thinking” machines are only getting higher.

What’s certain now? That the cyber landscape is evolving, and our best response may be to understand, share, and adapt as quickly as the machines themselves.

Contact Us

Speak with an Account Executive

Packetlabs Company Logo
    • Toronto | HQ
    • 401 Bay Street, Suite 1600
    • Toronto, Ontario, Canada
    • M5H 2Y4
    • San Francisco | Outpost
    • 580 California Street, 12th floor
    • San Francisco, CA, USA
    • 94104