A new Cisco report found that artificial intelligence systems lose their safety awareness the longer users interact with them. Researchers discovered that with enough dialogue, most AI chatbots eventually shared harmful or restricted information.
Cisco tested large language models from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. The team conducted 499 conversations using a method called “multi-turn attacks,” where users ask a series of questions to bypass built-in safeguards. Each exchange involved between five and ten prompts.
The company found that after repeated questioning, 64% of AI tools revealed unsafe or inappropriate information, compared to 13% when asked a single question. “The longer the conversation continued, the more likely models forgot their safety rules,” Cisco reported.
Most AI Models Failed Repeated Safety Tests
Cisco compared responses across all major chatbots to measure how often each provided harmful content. Mistral’s Large Instruct model had the highest failure rate at 93%, while Google’s Gemma performed best at 26%.
Researchers warned that attackers could exploit these weaknesses to access private corporate data or spread misinformation. “AI tools can leak sensitive information or enable unauthorized access when their guardrails erode,” Cisco said.
The report emphasized that most systems “forget” earlier safety directives during extended interactions, allowing attackers to refine questions and sneak past filters. That gradual breakdown in self-regulation, Cisco noted, increases the risk of large-scale data breaches and disinformation campaigns.
Open Models Shift Responsibility to Users
Cisco highlighted that open-weight language models, including those from Mistral, Meta, and Google, let the public access their training parameters. These models often have lighter built-in safety systems to encourage customization. “That flexibility moves the safety burden onto whoever modifies the model,” the report said.
The study also acknowledged that companies like OpenAI, Meta, Microsoft, and Google have tried to limit malicious fine-tuning. Still, critics argue that weak oversight makes it easy for criminals to repurpose AI tools.
Cisco cited a case from August, when U.S. firm Anthropic revealed that criminals used its Claude model to steal personal data and demand ransoms exceeding $500,000. “This shows how fragile AI safety remains when systems are left unmonitored,” Cisco concluded.
