Anthropic’s Claude AI, specifically its Code Interpreter tool with new network features, can be manipulated through indirect prompt injection to steal sensitive user data.
Security researcher Johann Rehberger revealed in October 2025 that attackers could exploit these capabilities to extract chat histories and upload them directly to their own accounts.
The discovery underscores growing concerns about how expanding AI connectivity introduces new security risks.
The Vulnerability
The flaw lies in Claude’s default network setting, “Package managers only,” which allows access to a small list of approved domains, including api[.]anthropic[.]com.
This setup was meant to let Claude safely install software packages from trusted sources like npm, PyPI, and GitHub.
However, Rehberger showed that this configuration could be abused to establish a backdoor into Claude’s systems.
By embedding malicious instructions into an innocent-looking file or message, an attacker can trick Claude into executing hidden code.
This indirect prompt injection causes the model to read user data, save it to a file within its sandbox, and use Anthropic’s own APIs to send that file to the attacker’s account.
How the Attack Works
Rehberger’s proof-of-concept begins when the victim asks Claude to analyze a tainted document.
The embedded payload instructs Claude to gather recent chat data, write it to a file named hello.md in the sandbox, and then use the Anthropic SDK to upload it.
Using the attacker’s API key, Claude unknowingly sends the stolen file — up to 30MB per upload — to the attacker’s Anthropic Console.
Multiple uploads can occur in sequence, allowing for large-scale data theft.
Rehberger noted that the exploit initially succeeded without issue, though later versions of Claude began flagging obvious API keys as suspicious.
To bypass this, he disguised the malicious code within harmless print statements, tricking Claude into executing it again.
Anthropic’s Response
Rehberger disclosed the issue responsibly via HackerOne.
At first, Anthropic dismissed it as a “model safety” issue, placing it out of scope for vulnerability reporting.
Following public discussion, the company acknowledged the oversight on October 30 and confirmed the finding as a valid security issue.
Anthropic’s documentation already warned users about the potential for data exfiltration through network egress. The company advises monitoring Claude sessions and halting activity if unusual behavior is detected.
Security professionals have described this exploit as part of the lethal trifecta of AI security risks: powerful models, external connectivity, and prompt-based control.
As AI tools like Claude integrate deeper into professional workflows, even limited network access can become an open door for attackers.
Rehberger’s findings demonstrate that what begins as a feature designed for convenience — like package installation — can evolve into a serious security liability.
Allowing AI systems to make outbound network calls without strict user verification introduces the risk of direct data theft.
Preventing Future Attacks
For Anthropic and other AI developers, mitigation should start by enforcing strict sandbox controls that restrict API calls to the authenticated user’s own account.
Allowlists should be minimized and carefully reviewed to eliminate unintended access paths.
End users can protect themselves by disabling network access when possible, allowing only essential domains, and closely monitoring session activity for unusual file creation or code execution.
Sensitive data should never be processed through AI tools with active network permissions unless strong safeguards are confirmed.
The Broader Lesson
The Claude exploit highlights a critical truth about modern AI systems: connectivity brings both capability and vulnerability.
As models gain the power to run code, retrieve data, and interact with online systems, the boundary between helpful automation and harmful misuse grows thinner.
Without proper oversight, even trusted AI assistants can be turned into data exfiltration tools.
The incident serves as a warning to developers and users alike — in the age of connected AI, every feature must be secured as though it could be weaponized.





