SHARE

When AI Agents Go Rogue: Inside the Agent Session Smuggling Attack

Researchers discovered agent session smuggling, a new attack where rogue AI agents secretly inject commands to deceive and manipulate other agents.

Written By

Ken Underhill

Nov 3, 2025

eSecurity Planet content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

A new potential cyber threat has emerged — one that doesn’t target humans directly, but the AI agents themselves.

Researchers at Palo Alto Networks’ Unit 42 have identified a technique they call agent session smuggling.

This is an attack that allows a malicious AI agent to covertly inject harmful instructions into an ongoing communication between agents using the Agent2Agent (A2A) protocol.

This discovery exposes a growing risk within multi-agent ecosystems, where intelligent systems collaborate autonomously to complete complex tasks.

The attack demonstrates how a compromised or malicious agent can exploit built-in trust mechanisms and session memory to manipulate or deceive its peers — all without a human ever noticing.

Invisible Attacks Between Trusted Agents
Proof-of-Concept Demonstrations
Comparing A2A and MCP
Keeping Rogue Agents in Check
The New Frontier of AI Security

Invisible Attacks Between Trusted Agents

The A2A protocol enables interoperability among AI agents from different systems, allowing them to coordinate tasks, share information, and maintain context across multiple interactions.

Unlike stateless systems, which treat each exchange as independent, A2A sessions are stateful, meaning they remember prior conversations and actions.

The agent session smuggling attack takes advantage of this stateful design.

Once a legitimate session is established, a malicious remote agent uses it as a covert channel to inject hidden instructions between client requests and server responses.

These commands, buried within otherwise normal exchanges, can lead to:

Context poisoning, corrupting the victim’s understanding of the conversation.
Data exfiltration, leaking sensitive memory, credentials, or internal tool information.
Unauthorized actions, where the victim agent executes unintended commands on behalf of the user.

In practice, this means that a rogue agent could impersonate a trusted collaborator, gradually steering another agent into revealing data or executing actions — like placing trades or sharing confidential information — without any visible indication to the human operator.

Proof-of-Concept Demonstrations

To illustrate the threat, Unit 42 developed two proof-of-concept (PoC) attacks using Google’s Agent Development Kit (ADK) and the A2A protocol.

In the first scenario, a malicious research assistant agent was able to trick a financial assistant client agent into revealing sensitive information, such as system instructions, tool configurations, and chat history.

The attack unfolded through a series of seemingly harmless follow-up questions during a delegated task — questions that blended naturally into the conversation flow but gradually leaked protected data.

In the second PoC, the malicious agent escalated its behavior by smuggling hidden instructions that led the financial assistant to execute unauthorized stock trades.

This showed how session smuggling can progress from information theft to direct system manipulation.

These intermediate actions were invisible in standard chat interfaces, which typically display only the user’s request and the final response. That invisibility makes detection especially challenging.

Comparing A2A and MCP

The A2A protocol shares some conceptual similarities with the Model Context Protocol (MCP) — another framework for connecting AI systems to external tools — but A2A’s stateful, adaptive nature introduces new risks.

MCP sessions are generally stateless and deterministic, meaning each tool invocation is isolated and predictable.

A2A, by contrast, supports multi-turn, model-driven communication, where agents remember previous interactions and dynamically generate context-aware responses.

This combination of memory, autonomy, and adaptability makes A2A systems more powerful — but also more vulnerable.

A malicious agent can refine its strategy across multiple exchanges, exploiting contextual continuity to build trust, stay hidden, and escalate its influence.

Keeping Rogue Agents in Check

Defending against threats like agent session smuggling requires more than patching — it requires a strategic, layered approach to securing AI ecosystems.

Enforce human-in-the-loop (HitL) controls for high-impact or sensitive actions, using out-of-band confirmation methods and clear approval workflows to prevent autonomous misuse.
Validate agent identity and context through cryptographically signed credentials (e.g., AgentCards) and context-grounding checks that ensure instructions remain aligned with original user intent.
Isolate and secure AI environments by segmenting agent networks, sandboxing untrusted agents, and enforcing zero-trust principles with least-privilege access.
Enhance visibility and monitoring with real-time logging of agent activity, anomaly detection, and integration of AI telemetry into existing SIEM/XDR tools to detect suspicious behavior.
Harden data and models through input validation, encryption of communications and session data, adversarial robustness training, and strict data minimization practices.
Strengthen governance and incident readiness by establishing AI security policies, conducting regular risk assessments and red-team exercises, and including AI-specific IR playbooks for rapid containment.

By implementing these layered defenses, organizations can reduce the risk of AI agent compromise and unauthorized activity and build cyber resilience.

The New Frontier of AI Security

As AI ecosystems evolve toward autonomous, interconnected agents, the attack surface expands beyond traditional endpoints to the agents themselves.

These intelligent entities — capable of reasoning, memory, and coordination — can be exploited much like humans through deception and trust abuse.

Although no active exploitation of agent session smuggling has been reported in the wild, its low barrier to execution makes it a credible risk.

All an attacker needs is to convince one victim agent to communicate with a malicious peer.

Once the session is established, covert instructions can be smuggled undetected, bypassing both human oversight and standard logging mechanisms.

The agent session smuggling attack marks a new chapter in AI security, revealing how intelligent systems can manipulate one another in ways invisible to humans.

It underscores the need for trust verification, transparency, and layered defense across all AI-to-AI communication.

As multi-agent systems continue to power the next generation of autonomous applications, organizations must treat agent collaboration as a potential threat vector — not just a productivity advantage.

Designing secure orchestration frameworks today will be critical to defending tomorrow’s intelligent networks from adaptive, AI-powered adversaries.

Ken Underhill

Ken Underhill is an award-winning cybersecurity professional, bestselling author, and seasoned IT professional. He holds a graduate degree in cybersecurity and information assurance from Western Governors University and brings years of hands-on experience to the field.