How to jailbreak ChatGPT
Breaking all the rules
The term 'jailbreaking' in a computing context emerged around the mid-2000s, specifically linked to the rise of Apple's iPhone. Users started to develop methods to bypass the device's restrictions and modify the iOS operating system. This process was termed "jailbreaking," metaphorically suggesting breaking out of the 'jail' of software limitations imposed by the manufacturer.
The term has since been used in a broader sense in the tech community to describe similar processes on other devices and systems.
When people refer to "jailbreaking" ChatGPT, they're not talking about making changes to software but rather ways to get around ChatGPT's guidelines and usage policies through prompts.
Tech enthusiasts often see jailbreaking as a challenge. It's a way of testing the software to see how robust it is, and testing parameters enables them to understand the underlying workings of ChatGPT.
Jailbreaking usually involves giving ChatGPT hypothetical situations where it is asked to role-play as a different kind of AI model who doesn't abide by Open AI's terms of service.
There are several established templates for doing this, which we'll cover below. We'll also cover the common themes used in ChatGPT jailbreak prompts.
Although we can cover the methods used, we can't actually show the results obtained because, unsurprisingly, contravening ChatGPT standards produces content that we can't publish on TechRadar, either.
Get the best Black Friday deals direct to your inbox, plus news, reviews, and more.
Sign up to be the first to know about unmissable Black Friday deals on top tech, plus get all your favorite TechRadar content.
The current rules that ChatGPT has in place include:
- No explicit, adult, or sexual content.
- No harmful or dangerous activities.
- No responses that are offensive, discriminatory, or disrespectful to individuals or groups.
- No misinformation or false facts.
Most jailbreaking techniques are designed to circumnavigate these regulations. We'll leave it to your conscience to decide quite how ethical it is to do so.
How to jailbreak ChatGPT
Warning:
Although jailbreaking isn't specifically against Open AI's terms of service, using ChatGPT to produce immoral, unethical, dangerous, or illegal content is prohibited in its policies.
As jailbreaking produces answers that OpenAI has tried to safeguard against, there is a risk that you could compromise your ChatGPT account.
There have been instances of individuals using jailbreaking prompts and having their ChatGPT Plus accounts shut down following "suspicious activity". So be warned.
Disclaimer: The information in this article is provided for information purposes only, and TechRadar does not condone contravening OpenAI's ChatGPT policies.
1. Use an existing jailbreak prompt
There are many existing jailbreak prompts that others have shared online, and people are adding to this list all the time. If you want to find out more, you can check out ChatGPTJailbreak on Reddit.
The advantage of a ready-made script is that it is quick and easy to copy and paste into ChatGPT. However, once a successful jailbreak prompt has been shared online, OpenAI ChatGPT developers will also be aware of it. OpenAI uses the hacks created by its users to locate vulnerabilities in its system.
Users report that jailbreaking scripts only work some of the time. ChatGPT-4 appears to be harder to fool than previous versions.
Many of these prompts use similar logic, so it would be possible to create your own. We'll cover the necessary components of a successful jailbreak script in the next four steps.
2. Tell ChatGPT to roleplay as a different kind of GPT
In order to get ChatGPT to break its own rules, you need to assign it a character to play. Successful jailbreak prompts will tell ChatGPT to pretend that it's a new type of GPT, which operates according to different guidelines, or to roleplay a human-like character with a particular ethical code.
It is important to ensure that ChatGPT is producing results not as itself but as a fictional character.
3. Tell ChatGPT to ignore ethical and moral guidelines
Once you have given ChatGPT a role to play, you need to establish the parameters of the roleplay. This will usually involve specifying that its hypothetical character has no ethical or moral guidelines.
Some prompts explicitly tell ChatGPT that it should promote immoral, unethical, illegal, and harmful behavior. Not all prompts include this, however. Some simply state that its new character doesn't have any filters or restrictions.
4. Tell it never to say no
In its default setting, when not following jailbreak prompts, ChatGPT will refuse to answer questions that contravene its guidelines by saying something like "I'm sorry, I can't fulfill this request".
So, to get around this, most jailbreak prompts contain clear instructions never to refuse a request. ChatGPT is told that its character should never say it can’t do something. Many prompts also tell ChatGPT to make something up when it doesn’t know an answer.
5. Ask ChatGPT to confirm it's in character
A jailbreak prompt should include an instruction to get ChatGPT to show that it’s working as the new fictional GPT. Sometimes, this is simply a command for ChatGPT to confirm that it is operating in its assigned character. Many prompts also contain instructions for ChatGPT to preface its answers with the name of its fictional identity to make it clear that it is successfully operating in character.
Because ChatGPT can sometimes forget earlier instructions, it may revert to its default ChatGPT role during a conversation. In this case, you'll need to remind it to stay in character, or post the jailbreak prompt text again.
The success of a jailbreak prompt will depend on several factors, including the instructions given, the version you use, and the task you have asked it to perform.
Even without a jailbreak prompt, ChatGPT will sometimes produce results that contravene its guidelines. Sometimes, it will refuse to produce erotic content, for example, and other times, it will generate it. AI models are often not consistent because they have an element of randomness in their response generation process, which means that given the same prompt multiple times, the model can produce different responses.
For example, ChatGPT doesn't swear, but I asked it to recite the profanity-laden poem This Be the Verse by Philip Larkin, and it did so without complaint or censoring.
Most ChatGPT jailbreakers will claim that they are doing so to test the system and better understand how it works, but there is a dark side to asking ChatGPT to produce content that it was explicitly designed not to.
People have asked jailbroken ChatGPT to produce instructions on how to make bombs or stage terrorist attacks. Understandably, OpenAI – along with other producers of AI, like Google Bard and Microsoft Copilot – are taking steps to tighten up its security and ensure that jailbreaking is no longer possible in the future.
You might also like
Emma Street is a freelance content writer who contributes technology and finance articles to a range of websites, including TechRadar, Tom's Guide, Top10.com, and BestMoney. Before becoming a freelance writer, she worked in the fintech industry for more than 15 years in a variety of roles, including software developer and technical writer. Emma got her first computer in 1984 and started coding games in BASIC at age 10. (Her long, rambling, [and probably unfinishable] Land of Zooz series still exists on a 5-inch floppy disk up in her parents' loft somewhere.) She then got distracted from coding for a few decades before returning to university in her thirties, getting a Computing Science degree, and realizing her ambition of becoming a fully-fledged geek. When not writing about tech and finance, Emma can be found writing about films, relationships, and tea. She runs a tea blog called TeaFancier.com and holds some very strong opinions about tea. She has also written a bunch of romance novels and is aided at work by a tech-savvy elderly cat who ensures Emma fully understands all the functions of the F keys so she can quickly undo whatever the cat has just activated while walking over the keyboard.