Enterprise AI Has a Security Problem That a Single Test Cannot Solve

Enterprise AI Has a Security Problem That a Single Test Cannot Solve

Enterprises are deploying AI faster than they are securing it. Chatbots are handling millions of sensitive customer queries daily, while AI assistants are accessing thousands of financial records every second. The assumption underpinning much of this deployment is that the models powering these tools are, by default, safe enough to trust.

New research suggests that assumption deserves serious scrutiny.

A benchmark study testing 34 AI models across more than 620,000 adversarial prompts has found that with the right technique, almost any model can be manipulated into unsafe behaviour. Some responded to harmful requests more than 90% of the time. Others blocked the same attack nine times before failing on the tenth. The pattern that emerges across all 34 models is that no AI model is fully immune, and a single security test at launch tells an organisation almost nothing about the risks it is actually carrying.

The research, published by TELUS Digital, evaluated models from ten providers across North America, Europe and China, including Anthropic, OpenAI, Google, Meta, Alibaba, Baidu, ByteDance, Zhipu AI, 01.AI and Mistral.

Each model was embedded within a simulated enterprise application, specifically, a bank’s AI assistant, to reflect real-world deployment conditions. Researchers then subjected each to adversarial attacks spanning a wide range of harm categories, from fraud and privacy exploitation to self-harm, discrimination and cybersecurity threats.

Vulnerability rates ranged from 1.3% to 93%, where a lower rate indicates a safer model.

What Makes a Model More or Less Vulnerable?

Three factors emerged as the strongest predictors of a model’s native safety.

The first is whether it reasons before responding. Models designed to think through their answers before generating output, so-called reasoning models, were significantly harder to exploit, with a vulnerability rate of 19.9%, compared to 55.1% for models that respond without that intermediate step. The difference is substantial and has direct implications for enterprises deciding which models to deploy in sensitive environments.

The second factor is size. Smaller models were consistently more vulnerable than larger ones, across both open-source and proprietary categories. In the open-source ecosystem the pattern was particularly pronounced: lightweight, budget-friendly models were far more likely to be compromised than their larger counterparts. Cost pressure is already pushing many enterprise teams toward smaller, cheaper deployments which are precisely the models the research identifies as most at risk.

The third factor is the development approach of the team behind the model. The research found meaningful differences between providers, though no single provider was immune. One assumption was challenged directly by the data: open-source models are not inherently less safe than proprietary ones. A large open-source model from Zhipu AI outperformed many proprietary alternatives. Geography proved equally irrelevant, where a model was built had no meaningful bearing on how well it held up under adversarial pressure.

Deploying Fast and Securing Slowly

Worldwide AI investment is projected at $2.52 trillion in 2026. The amount directed toward AI trust, risk and security management is roughly $3.43 billion — approximately $1 in security for every $735 spent on capability. Meanwhile, 86% of organisations report they have already experienced AI-related security incidents, and enforceable AI security regulations are now active in both the US and EU.

According to the research, AI safety cannot be verified through a one-time evaluation. Unlike traditional software, which produces the same output for the same input, AI models are probabilistic. One model in the study blocked the same attack nine times before failing on the tenth attempt. Another performed well against certain categories of harmful request while proving highly susceptible to others.

This non-deterministic behaviour means that a model appearing safe under standard testing may still fail under different conditions, different configurations, or sustained adversarial pressure. Changes to how an AI application is configured, what data it draws from, or which tools it connects to can each alter its security behaviour, without any change to the underlying model itself.

The research also identified a pattern called “refuse-but-engage” in which a model initially declines a harmful request but then provides related information that could still be misused or cause reputational damage. The benchmark treated any such response as a failure. A genuinely safe refusal, the researchers argue, should decline and stop.

Where the Risks Are Highest

While progress has been made in areas like political manipulation, most models tested remained noticeably susceptible to privacy exploitation, fraud and cybersecurity threats, even among the top performers. These categories map directly onto the highest-stakes interactions enterprises manage through AI-powered customer service and contact centre applications: identity verification, financial queries, account management and sensitive personal data handling.

Picking a model with a low vulnerability rate is not, on its own, a security strategy. The research points instead to a layered approach that operates on both sides of the conversation. Before a user’s message reaches the model, techniques such as prompt shielding and masking of personally identifiable information can block direct attacks. Before the model’s response reaches the user, it should be audited for harmful or inappropriate content.

Above all, the research argues that security testing cannot remain a point-in-time exercise. Continuous, automated testing with human oversight is presented as the only credible way to stay ahead of vulnerabilities that may not surface until a system has been in production, or under sustained pressure, for some time.