INFO News SC Media

Cisco study finds major frontier models susceptible to multi-turn prompt injection attacks

What: Cisco study shows AI models are vulnerable to multi-turn prompt injection attacks
Impact: AI models may be manipulated through complex attack sequences

AI/ML , Generative AI Cisco study finds major frontier models susceptible to multi-turn prompt injection attacks May 28, 2026 Share By Laura French A recent Cisco study found that 15 proprietary frontier models across five major vendors are susceptible to multi-turn prompt injection attacks, with attack success rates (ASR) differing significantly between single-turn and multi-turn attacks. The study tested base models from OpenAI, Anthropic, Google, Amazon and xAI with a total of 30,090 single-turn adversarial prompts and 6,986 multi-turn prompts across 1,456 conversations, finding that the ASR for multi-turn attacks ranged from 7.89% to 88.30% across models. By contrast, single-turn attack ASR ranged from 2.19% to 64.91%, without consistent model ordering between the two regimes, demonstrating that single-turn attack results do not reliably translate to a model’s resilience against real-world attacks, Cisco said. “Multi-turn evaluation matters for one reason: it is where attackers actually live. Real adversaries iterate. They reframe refusals, decompose tasks across turns, adopt personas, and escalate gradually. A single-turn benchmark cannot see any of that,” the researchers wrote. Cisco said the results reveal a weakness in current benchmarking practices and safety results reflected in many model cards. The report noted that widely used benchmarks such as HarmBench, AILuminate and TrustLLM rely on a single-turn approach, and relying only on single-turn tests poses a risk of “safetywashing.” Related reading: New LLM jailbreak method with 65% success rate developed by researchers Researchers demonstrate Agent2Agent prompt injection risk New LLM jailbreak uses models’ evaluation skills against them In Cisco’s tests, xAI’s Grok 4.1 Fast non-reasoning saw an ASR of 88.3% for multi-turn attacks, compared with just 34.1% for single-turn attacks. Google’s Gemini 3 Pro saw a multi-turn and single-turn ASR of 73.3% and 18.1% respectively — a four-fold increase. OpenAI’s GPT-5.4 has a single-turn ASR of 2.7%, which increased to 24.7% for multi-turn attacks. And Claude Opus 4.6 went from a 3.6% ASR for single-turn attacks to 16.2% for multi-turn. Amazon’s Nova 2 Lite, Nova Micro and Nova Lite models were the only models that were more susceptible to single-turn attacks than multi-turn: for example, Nova Micro was vulnerable to 64.9% of single-turn attacks, but only 30.9% of multi-turn attacks succeeded against the same model. Different configurations of the same model were found to potentially have a major impact on attack success rates, as demonstrated by a major decrease in multi-turn ASR, from 88.3% to 43.5%, when reasoning mode was activated for Grok 4.1 Fast. The researchers argue these findings should lead model providers to reconsider the way that prompt injection safety is evaluated and presented, calling for greater transparency and a move away from single-turn-only regimes. Cisco proposed these “evaluation rituals” for organizations that deploy AI models to consider when it comes to model safety. First, the researchers recommend not only publishing multi-turn ASRs for every model release, but also publishing ASR per strategy, such as roleplay/persona adoption, contextual ambiguity/misdirection and information decomposition and reassembly. “Strategy-stratified reporting matters because cross-model dispersion within each strategy is wide,” the report noted. Second, the report recommended providers hold the deployment of models that regress more than three percentage points for the highest ASR prompt injections, such as imposter and system prompt techniques, and content types, such as hate speech and specialized advice, for further safety review. Finally, models with a greater than 15 percentage-point gap between single-turn and multi-turn ASR should also undergo manual review before deployment, the Cisco researchers said, noting that eight of the 15 models evaluated in their tests surpassed this threshold. Overall, Cisco noted that all tested models showed significant susceptibility to prompt injection attacks, signaling a persistent and widespread issue that spans models and vendors, both proprietary and open source. “If no base model is iteratively safe, the security perimeter has to move outside the model: meaning the use of runtime guardrails, monitoring, red-teaming, and application-layer policies,” Cisco concluded. An In-Depth Guide to AI Get essential knowledge and practical strategies to use AI to better your security program. Learn More Laura French Related Application security OWASP launches FinBot to help developers secure AI agents OWASP GenAI Security Project Team May 28, 2026 OWASP’s FinBot gives developers hands-on training to secure AI agents. AI/ML Xage Security enhances zero-trust platform for AI agents SC Staff May 27, 2026 The updated platform introduces Xage Agent Sentry and Xage Resource Gateway, which aim to secure AI agents and the resources they access at multiple levels, including network interactions, local events, and operating system calls. Identity Laying the groundwork: A practical path to identity security for AI agents Paul Wagenseil May 27, 2026 As enterprises move toward AI-driven operations, identity modernization becomes essential. Get daily email updates SC Media's daily must-read of the most current and pressing daily news Business Email By clicking the Subscribe button below, you agree to SC Media Terms of Use and Privacy Policy . Subscribe Related Terms Algorithm You can skip this ad in 5 seconds

Read Full Article → ← Back to News

Cisco study finds major frontier models susceptible to multi-turn prompt injection attacks

Related Articles

Share this article