SHARE

New Claude Feature Turns Into a Hacker’s Playground

Anthropic’s new Claude file tool boosts productivity but exposes users to prompt injection attacks and potential data leaks.

Written By

Ken Underhill

Sep 10, 2025

eSecurity Planet content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Anthropic’s latest Claude AI upgrade boosts productivity… and introduces security risks.

The company’s new file creation feature enables in-chat document generation, but Anthropic’s own documentation warns it may be a risk to user data.

In their recent advisory, Anthropic cautioned, “It is possible for a bad actor to inconspicuously add instructions via external files or websites that trick Claude into downloading and running untrusted code and reading and/or leaking sensitive data.”

What is the new feature?

The Upgraded file creation and analysis feature provides Claude with a sandboxed computing environment that includes limited internet access.

This design lets the assistant pull dependencies from repositories and run code, boosting functionality but creating a new attack surface for malicious instructions.

The risk of prompt injection

The primary risk vector is prompt injection. Prompt injections exploit the fact that large language models treat all input — data, instructions, and hidden directives — as part of the same context.

Threat actors can embed malicious commands within uploaded files, linked websites, or even benign-looking text. When Claude processes this input, the hidden instructions may override user intent and trigger unsafe actions.

Anthropic warns that prompt injection could trick Claude into running untrusted code, since its sandbox inherently supports arbitrary script execution for file generation.

Another concern is the potential for attackers to extract sensitive data. If Claude is connected to knowledge sources such as project files, MCP integrations, or cloud-linked data, a prompt injection could direct the model to parse and reveal confidential content.

Since the model cannot reliably distinguish “don’t share this” from “copy this to external output,” sensitive information may be exposed unintentionally.

Prompt injection can also enable data exfiltration. With its limited internet access, Claude could be manipulated into making outbound HTTP requests that transmit stolen data to attacker-controlled servers. This means proprietary documents, API keys, or other sensitive information could be leaked through a covert external channel.

These risks arise because the sandbox grants Claude enough autonomy to perform meaningful computation and networking without strong guardrails.

How is Anthropic addressing the issue?

Even though Anthropic limits task duration, isolates enterprise sandboxes, and provides allowlists for outbound domains, these are simply mitigations rather than complete protections. If a user enables the feature while working with sensitive corporate data, a cleverly crafted file or link could bypass human oversight and leak information.

This vulnerability highlights the unresolved challenge of context window security: models process data and instructions together, making it hard to block malicious inputs without disrupting valid tasks. This turns every connected integration into a potential data leak vector.

Anthropic’s decision to release a feature with documented vulnerabilities underscores the “ship first, secure later” mentality driving competition in the AI industry.

Independent researcher Simon Willison criticized the approach, writing that Anthropic asking users to “monitor Claude while using the feature” is “unfairly outsourcing the problem.”

How to reduce your risk

Anthropic has rolled out some mitigations, including sandbox isolation for enterprise tenants, disabling public sharing of conversations that use the feature, and limiting task duration to avoid abuse loops.

Security teams can further reduce the risks of Claude’s file creation feature by taking the following proactive measures:

Audit Claude’s deployments and decide whether to enable file creation based on the sensitivity of the data being handled.
Enforce allowlists for outbound network requests and review sandbox actions regularly.
Monitor activity logs for signs of unauthorized data access or exfiltration attempts.
Educate staff on prompt injection threats and establish incident response playbooks for AI misuse.

Prompt injection still lacks a permanent solution, leaving AI tools vulnerable to data leaks unless safeguarded by layered defenses. The takeaway: Advancing AI means embedding security into every layer, not bolting it on later.

AI isn’t always just a risk — it can help your security team. Discover some of the key challenges and benefits of AI in cybersecurity.

Ken Underhill

Ken Underhill is an award-winning cybersecurity professional, bestselling author, and seasoned IT professional. He holds a graduate degree in cybersecurity and information assurance from Western Governors University and brings years of hands-on experience to the field.