Multi-Turn Attacks Expose Ongoing Weaknesses Across Frontier AI Models | eSecurity Planet

Multi-Turn Attacks Expose Ongoing Weaknesses Across Frontier AI Models

A Cisco study found frontier AI models remain vulnerable to multi-turn adversarial attacks.

Written By
Ken Underhill
Ken Underhill
May 28, 2026
3 minute read
eSecurity Planet content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

A Cisco evaluation of frontier LLMs found that no tested model consistently resisted multi-turn adversarial attacks, raising concerns about current AI safety assessments. 

The research suggests that many widely used AI safety benchmarks may underestimate real-world risk because they focus primarily on single-turn prompt evaluations rather than adaptive, iterative attacks.

Key Takeaways from Cisco’s Research

  • Cisco found that every tested frontier LLM remained vulnerable to multi-turn adversarial attacks, despite some showing strong single-turn safety performance.
  • Multi-turn attack success rates were significantly higher than single-turn results across both proprietary and open-weight AI models.
  • Attackers can bypass safeguards by gradually reframing requests, adopting personas, and escalating malicious prompts across multiple interactions.
  • Public AI safety benchmarks and model cards may underestimate real-world risk because they often focus only on single-turn evaluations.
  • Researchers recommend multi-turn testing, runtime monitoring, red-teaming, and external guardrails to improve enterprise AI security and governance.

Frontier Models Show High Multi-Turn Exposure 

Researchers evaluated 15 proprietary flagship models from OpenAI, Anthropic, Google, Amazon, and xAI using both single-turn and multi-turn attack scenarios. 

The findings showed that single-turn attack success rates (ASRs) ranged from 2.19% to 64.91%, while multi-turn ASRs ranged from 7.89% to 88.30%. 

Researchers concluded that single-turn testing alone does not accurately reflect how models behave when attackers can adapt tactics across multiple interactions.

The study builds on earlier research involving open-weight models, where multi-turn attack success rates were found to be two to 10 times higher than single-turn baselines. 

The pattern appears consistent across both open and proprietary frontier models, suggesting multi-turn attack exposure is a broader structural challenge in current LLM architectures.  

Advertisement

Multi-Turn Attacks Reveal AI Security Gaps

Researchers emphasized that multi-turn testing more accurately reflects real-world adversarial behavior because attackers rarely rely on a single malicious prompt

Instead, threat actors often reframe requests, escalate gradually, adopt personas, or split malicious objectives across multiple interactions to bypass safeguards.

The report found substantial differences between single-turn and multi-turn performance across the tested models. 

OpenAI’s GPT-5.4 increased from a 2.74% single-turn ASR to 24.68% under multi-turn conditions, while Google’s Gemini 3 Pro increased from 18.10% to 73.35%. 

Anthropic’s Claude models, which showed some of the strongest single-turn refusal rates, still reached multi-turn ASRs between 11.16% and 16.20%. 

xAI’s Grok 4.1 Fast in non-reasoning mode recorded the highest observed multi-turn ASR at 88.30%.

The researchers also identified significant variations tied to deployment configurations. 

Grok 4.1 Fast showed a notable reduction in multi-turn ASR when reasoning mode was enabled, decreasing from 88.30% to 43.47% under the same testing conditions. 

The report noted that safety differences tied to configuration settings, reasoning modes, or guardrail options are not typically reflected in public benchmarks or model cards.

Attack Strategies and Enterprise Risk

The report categorized adversarial tactics into several strategy families, including role-play and persona adoption, contextual ambiguity, refusal reframing, information decomposition, and incremental escalation techniques. 

Researchers found that performance varied across these attack categories, even among models with relatively low aggregate ASRs.

Single-turn weaknesses also concentrated around several recurring procedures, including imposter AI techniques, soft paraphrasing, and system prompt manipulation. 

These attack patterns consistently produced higher ASRs than many other prompt categories and could serve as priority areas for defensive improvements.

The findings have broader implications for organizations deploying AI systems in enterprise environments. 

Researchers warned that relying solely on publicly available single-turn benchmark scores could create governance and procurement risks because models with similar headline safety scores may behave very differently during iterative attacks.

Advertisement

Strengthening AI Security Testing and Governance

The report aligns with broader regulatory discussions around adversarial AI testing and model robustness. 

Frameworks including the NIST AI Risk Management Framework, the forthcoming NIST Cyber AI Profile, and the European Union AI Act all reference adversarial robustness testing. 

However, it should be noted that many of these frameworks currently lack detailed guidance around multi-turn evaluation methodologies. 

Researchers recommended paired-regime testing for single-turn and multi-turn attacks, publishing strategy-specific ASRs, and reviewing models with large cross-regime performance gaps.  

The report also suggested that enterprises increasingly focus on runtime monitoring, red-teaming, application-layer protections, and external guardrails rather than relying solely on model-level safety claims.

The findings show that AI resilience testing must move beyond static benchmarks to better reflect real-world, multi-turn attacker behavior. 

Ken Underhill

Ken Underhill is an award-winning cybersecurity professional, bestselling author, and seasoned IT professional. He holds a graduate degree in cybersecurity and information assurance from Western Governors University and brings years of hands-on experience to the field.

eSecurity Planet Logo

eSecurity Planet is a leading resource for IT professionals at large enterprises who are actively researching cybersecurity vendors and latest trends. eSecurity Planet focuses on providing instruction for how to approach common security challenges, as well as informational deep-dives about advanced cybersecurity topics.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.