AI guardrails can be easily beaten, even if you don't mean to

By Craig Hale published 20 May 2024

New report reveals how easy it is to jailbreak AI

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

A representational concept of a social media network

(Image credit: Shutterstock / metamorworks)

Guardrails designed to prevent AI chatbots from generating illegal, explicit or otherwise wrong responses can be easily bypassed, according to research from the UK’s AI Safety Institute (AISI).

The AISI found that five undisclosed large language models were “highly vulnerable” to jailbreaks –inputs and prompts crafted to elicit responses that are not intended by their makers.

In a recent report, AISI researchers revealed that the models could be circumvented with minimal effort, highlighting the ongoing safety and security concerns associated with generative AI.

AI chatbots can be jailbroken too easily

The report, which arrived in anticipation of the upcoming AI Safety Summit in Seoul, jointly hosted by South Korea and the UK, noted:

“All tested models remain highly vulnerable to basic “jailbreaks”, and some will produce harmful outputs even without dedicated attempts to circumvent safeguards.”

Despite claims from all of the leading AI developers, such as OpenAI, Meta and Google, about their in-house safety measures, AISI’s findings suggest that significant gaps that could lead to potentially lead to major safety concerns remain.

Although the UK government has withheld the names of the five models it tested, it confirmed that they are publicly available.

Are you a pro? Subscribe to our newsletter

Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!

Contact me with news and offers from other Future brandsReceive email from us on behalf of our trusted partners or sponsors

By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.

The interim report, which precedes a full report expected to be published later this year with research from more than 30 countries, arrived just days before the Seoul-based AI Safety Summit, which is seen as the successor to Britain’s Bletchley Park summit late last year.

At the upcoming Seoul Summit, jointly hosted by South Korean President Yoon Suk Yeol and British Prime Minister Rishi Sunak, global leaders and industry experts are expected to come together to discuss AI safety within the realms of innovation and inclusivity.

More from TechRadar Pro

28 countries agree to develop AI safely and responsibly at Bletchley Park summit
Fancy trying the tech out? Check out the best AI tools and best AI writers
Need an upgrade? These are the best business laptops

Craig Hale

Social Links Navigation

With several years’ experience freelancing in tech and automotive circles, Craig’s specific interests lie in technology that is designed to better our lives, including AI and ML, productivity aids, and smart fitness. He is also passionate about cars and the decarbonisation of personal transportation. As an avid bargain-hunter, you can be sure that any deal Craig finds is top value!

Not even fairy tales are safe - researchers weaponise bedtime stories to jailbreak AI chatbots and create malware DeepSeek ‘incredibly vulnerable’ to attacks, research claims Experts warn DeepSeek is 11 times more dangerous than other AI chatbots Australian and Indian governments block DeepSeek from worker devices ChatGPT security flaw could open the gate for devastating cyberattack, expert warns Anthropic has a new security system it says can stop almost all AI jailbreaks ChatGPT and Google Gemini are terrible at summarizing news, according to a new study Open AI bans multiple accounts found to be misusing ChatGPT Securely working with AI-generated code DeepSeek is under fire – is there anywhere left to hide for the Chinese chatbot? The surprising reason ChatGPT and other AI tools make things up – and why it’s not just a glitch I’ve got bad news for you if you use ChatGPT, Perplexity, or Gemini as your main search tool - AI web search isn't worth your time, yet

Latest in Pro

UnitedHealth is now asking doctors to repay the loans it gave out following major hack "Slopsquatting" attacks are using AI-hallucinated names resembling popular libraries to spread malware Millions of Google Chrome users could be at risk from these dodgy extensions US government hails IT cuts as key part of billion-dollar Department of Defense savings Amazon paid out more to Jeff Bezos than its actual CEO in 2024 Microsoft Defender is getting a useful upgrade to help stop endpoint attacks Google Cloud has big plans to take the pain out of adopting AI agents in your business This worrying Microsoft 365 phishing kit has seen a huge upgrade, experts warn IKEA black Friday ransomware attack cost franchise firm millions Top US lab testing firm hit with major data leak, exposes health info on 1.6 million users Microsoft signs a major new carbon removal deal to try and help minimize environmental effects even further Listen up, your Google Docs and Sheets are about to get a lot louder and more interactive

Latest in News

Apple has a plan for improving Apple Intelligence, but it needs your help – and your data Quordle hints and answers for Tuesday, April 15 (game #1177) Chipolo's Pop item tracker upgrades the AirTag in nearly every way I wanted, and it works with Android too NYT Strands hints and answers for Tuesday, April 15 (game #408) NYT Connections hints and answers for Tuesday, April 15 (game #674) Bad news, the Fujifilm X100VI may now be impossible to buy in the US – here's what to get instead 5 reasons why Apple making iPadOS 19 more like macOS is a great idea – and 3 reasons why it could be a disaster AMD squares up to Intel and Nvidia in the budget GPU arena, as leaked Radeon RX 9060 XT specs and price show a potentially mighty affordable graphics card This wild phone battery rumor has me wondering whether iPhone will ever catch up to Android HBO's Harry Potter TV show confirms the cast for Dumbledore, Snape, Hagrid and more, but there are three surprises I wasn't expecting Toshiba's wireless portable turntable is the Sound Burger of 2025 – just don't take it to beach parties Holy downforce! This fan-powered EV just became the world’s first to drive upside down – and it’s more than just a party trick