Study Reveals AI Becomes Less Safe in Longer Conversations

A new Cisco report found that artificial intelligence systems lose their safety awareness the longer users interact with them. Researchers discovered that with enough dialogue, most AI chatbots eventually shared harmful or restricted information.

Cisco tested large language models from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. The team conducted 499 conversations using a method called “multi-turn attacks,” where users ask a series of questions to bypass built-in safeguards. Each exchange involved between five and ten prompts.

The company found that after repeated questioning, 64% of AI tools revealed unsafe or inappropriate information, compared to 13% when asked a single question. “The longer the conversation continued, the more likely models forgot their safety rules,” Cisco reported.

Most AI Models Failed Repeated Safety Tests

Cisco compared responses across all major chatbots to measure how often each provided harmful content. Mistral’s Large Instruct model had the highest failure rate at 93%, while Google’s Gemma performed best at 26%.

Researchers warned that attackers could exploit these weaknesses to access private corporate data or spread misinformation. “AI tools can leak sensitive information or enable unauthorized access when their guardrails erode,” Cisco said.

The report emphasized that most systems “forget” earlier safety directives during extended interactions, allowing attackers to refine questions and sneak past filters. That gradual breakdown in self-regulation, Cisco noted, increases the risk of large-scale data breaches and disinformation campaigns.

Open Models Shift Responsibility to Users

Cisco highlighted that open-weight language models, including those from Mistral, Meta, and Google, let the public access their training parameters. These models often have lighter built-in safety systems to encourage customization. “That flexibility moves the safety burden onto whoever modifies the model,” the report said.

The study also acknowledged that companies like OpenAI, Meta, Microsoft, and Google have tried to limit malicious fine-tuning. Still, critics argue that weak oversight makes it easy for criminals to repurpose AI tools.

Cisco cited a case from August, when U.S. firm Anthropic revealed that criminals used its Claude model to steal personal data and demand ransoms exceeding $500,000. “This shows how fragile AI safety remains when systems are left unmonitored,” Cisco concluded.

What's Hot

BP Braces for $5bn Green Energy Writedown as Strategy Shifts Back to Fossil Fuels

US approves Nvidia to resume sales of advanced AI chips to China

World Leaders Warn: Economic Warfare Now the Biggest Global Threat

Study Reveals AI Becomes Less Safe in Longer Conversations

Study Finds Rapid Weight Regain After Stopping Weight-Loss Jabs

North Korea Showcases New Missile Tests as Kim Jong-un Defends Nuclear Build-Up

BYD Leads the Global Electric Vehicle Market, Overtaking Tesla

World Leaders Warn: Economic Warfare Now the Biggest Global Threat

Team USA Eyes Figure Skating Medals

Google Joins the $4 Trillion Club as AI Fever Drives Tech to New Heights

Diageo Considers Selling Chinese Assets as New CEO Moves to Streamline Business

AI Advances for Astronaut Health

Meta Under Fire Over AI Chats with Children

Record Heat Sparks Massive Wildfires Across Spain and Portugal

Latest News

World Leaders Warn: Economic Warfare Now the Biggest Global Threat

Tensions Boil Over as US and Israel Weigh Action Against Iran Amid Deadly Crackdown

U.S. Tightens Grip on Venezuelan Oil Trade With Fifth Tanker Seizure

What's Hot

Study Reveals AI Becomes Less Safe in Longer Conversations

Most AI Models Failed Repeated Safety Tests

Open Models Shift Responsibility to Users

Related Posts