AI guardrails can be easily beaten, even if you don't mean to

A representational concept of a social media network
(Image credit: Shutterstock / metamorworks)

Guardrails designed to prevent AI chatbots from generating illegal, explicit or otherwise wrong responses can be easily bypassed, according to research from the UK’s AI Safety Institute (AISI).

The AISI found that five undisclosed large language models were “highly vulnerable” to jailbreaks –inputs and prompts crafted to elicit responses that are not intended by their makers.

In a recent report, AISI researchers revealed that the models could be circumvented with minimal effort, highlighting the ongoing safety and security concerns associated with generative AI.

AI chatbots can be jailbroken too easily

The report, which arrived in anticipation of the upcoming AI Safety Summit in Seoul, jointly hosted by South Korea and the UK, noted:

“All tested models remain highly vulnerable to basic “jailbreaks”, and some will produce harmful outputs even without dedicated attempts to circumvent safeguards.”

Despite claims from all of the leading AI developers, such as OpenAI, Meta and Google, about their in-house safety measures, AISI’s findings suggest that significant gaps that could lead to potentially lead to major safety concerns remain.

Although the UK government has withheld the names of the five models it tested, it confirmed that they are publicly available.

The interim report, which precedes a full report expected to be published later this year with research from more than 30 countries, arrived just days before the Seoul-based AI Safety Summit, which is seen as the successor to Britain’s Bletchley Park summit late last year.

At the upcoming Seoul Summit, jointly hosted by South Korean President Yoon Suk Yeol and British Prime Minister Rishi Sunak, global leaders and industry experts are expected to come together to discuss AI safety within the realms of innovation and inclusivity.

More from TechRadar Pro

Craig Hale

With several years’ experience freelancing in tech and automotive circles, Craig’s specific interests lie in technology that is designed to better our lives, including AI and ML, productivity aids, and smart fitness. He is also passionate about cars and the decarbonisation of personal transportation. As an avid bargain-hunter, you can be sure that any deal Craig finds is top value!

Read more
A person using DeepSeek on their smartphone
DeepSeek ‘incredibly vulnerable’ to attacks, research claims
DeepSeek
Experts warn DeepSeek is 11 times more dangerous than other AI chatbots
DeepSeek on a mobile phone
Australian and Indian governments block DeepSeek from worker devices
DDoS attack
ChatGPT security flaw could open the gate for devastating cyberattack, expert warns
Claude AI landing page
Anthropic has a new security system it says can stop almost all AI jailbreaks
ChatGPT app on an iPhone
ChatGPT and Google Gemini are terrible at summarizing news, according to a new study
Latest in Pro
Salesforce Agentforce 2dx
Salesforce gives AI agents the power to be proactive and autonomous like never before
An abstract image of a lock against a digital background, denoting cybersecurity.
Cyber resilience under DORA – are you prepared for the challenge?
Sam Altman and OpenAI
UK regulator clears Microsoft’s $13bn deal with OpenAI after lengthy delay
A person holding out their hand with a digital AI symbol.
The decision-maker's playbook: integrating Generative AI for optimal results
AMD Ryzen 9950X
Ryzen CPUs are the cheapest Zen 5 cores you can buy, but I was surprised to see this AMD 192-core CPUs on the value leaderboard
The socket interface of the Intel Core Ultra processor
Intel unveils its most powerful AI PCs yet - new Intel Core Ultra Series 2 processors pack in vPro for lightweight laptops and high-performance workstations alike
Latest in News
Salesforce Agentforce 2dx
Salesforce gives AI agents the power to be proactive and autonomous like never before
Microsoft Store logo on a blurred background
There's finally a fix for an annoying Microsoft Store bug that's older than Windows 11
A screenshot showing Naoe looking at the hidden blade in Assassin's Creed Shadows
Prep 107GB of space as Assassin's Creed Shadows preload and expected global release times are shared by Ubisoft
Google Pixel Watch 3 side dial and button
The Pixel Watch just got a secret display upgrade in Wear OS 5.1, but here’s why you probably shouldn’t use it
Sam Altman and OpenAI
UK regulator clears Microsoft’s $13bn deal with OpenAI after lengthy delay
the last of us 2 gate codes
The Last of Us director Neil Druckmann speaks on the possibility of The Last of Us Part 3: 'I guess the only thing I would say is don’t bet on there being more'