Researchers prove ChatGPT and other big bots can - and will - go to the dark side

Abstract illustrated image of a brain
(Image credit: Getty)

For a lot of us, AI-powered tools have quickly become a part of our everyday life, either as low-maintenance work helpers or vital assets used every day to help generate or moderate content. But are these tools safe enough to be used on a daily basis? According to a group of researchers, the answer is no.

Researchers from Carnegie Mellon University and the Center for AI Safety set out to examine the existing vulnerabilities of AI Large Language Models (LLMs) like popular chatbot ChatGPT to automated attacks. The research paper they produced demonstrated that these popular bots can easily be manipulated into bypassing any existing filters and generating harmful content, misinformation, and hate speech.

This makes AI language models vulnerable to misuse, even if that may not be the intent of the original creator. In a time when AI tools are already being used for nefarious purposes, it’s alarming how easily these researchers were able to bypass built-in safety and morality features.

If it's that easy ... 

Aviv Ovadya, a researcher at the Berkman Klein Center for Internet & Society at Harvard commented on the research paper in the New York Times, stating: “This shows - very clearly - the brittleness of the defenses we are building into these systems.”  

The authors of the paper targeted LLMs from OpenAI, Google, and Anthropic for the experiment. These companies have built their respective publicly-accessible chatbots on these LLMs, including ChatGPT, Google Bard, and Claude. 

As it turned out, the chatbots could be tricked into not recognizing harmful prompts by simply sticking a lengthy string of characters to the end of each prompt, almost ‘disguising’ the malicious prompt. The system’s content filters don’t recognize and can’t block or modify so generates a response that normally wouldn’t be allowed. Interestingly, it does appear that specific strings of ‘nonsense data’ are required; we tried to replicate some of the examples from the paper with ChatGPT, and it produced an error message saying ‘unable to generate response’.

Before releasing this research to the public, the authors shared their findings with Anthropic, OpenAI, and Google who all apparently shared their commitment to improving safety precautions and addressing concerns.

This news follows shortly after OpenAI closed down its own AI detection program, which does lead me to feel concerned, if not a little nervous. How much could OpenAI care about user safety, or at the very least be working towards improving safety, when the company can no longer distinguish between bot and man-made content?

TOPICS
Muskaan Saxena
Computing Staff Writer

Muskaan is TechRadar’s UK-based Computing writer. She has always been a passionate writer and has had her creative work published in several literary journals and magazines. Her debut into the writing world was a poem published in The Times of Zambia, on the subject of sunflowers and the insignificance of human existence in comparison. Growing up in Zambia, Muskaan was fascinated with technology, especially computers, and she's joined TechRadar to write about the latest GPUs, laptops and recently anything AI related. If you've got questions, moral concerns or just an interest in anything ChatGPT or general AI, you're in the right place. Muskaan also somehow managed to install a game on her work MacBook's Touch Bar, without the IT department finding out (yet).

Read more
A person using DeepSeek on their smartphone
DeepSeek ‘incredibly vulnerable’ to attacks, research claims
AI tools.
Not even fairy tales are safe - researchers weaponise bedtime stories to jailbreak AI chatbots and create malware
DDoS attack
ChatGPT security flaw could open the gate for devastating cyberattack, expert warns
DeepSeek
Experts warn DeepSeek is 11 times more dangerous than other AI chatbots
Sam Altman and OpenAI
Open AI bans multiple accounts found to be misusing ChatGPT
AI hallucinations
Hallucinations are dropping in ChatGPT but that's not the end of our AI problems
Latest in Artificial Intelligence
Open AI
OpenAI unveiled image generation for 4o – here's everything you need to know about the ChatGPT upgrade
OpenAI logo
OpenAI just launched a free ChatGPT bible that will help you master the AI chatbot and Sora
Gemini on a smartphone.
Gemini is pulling ahead of ChatGPT – combining Deep Research with Audio Overviews is one of the best uses of AI I’ve seen so far
ChatGPT Advanced Voice mode on a smartphone.
Talking to ChatGPT just got better, and you don’t need to pay to access the new functionality
Grok Image Edits
I tried Grok’s new AI image editing features – they’re fun but won’t replace Photoshop any time soon
AI hallucinations
Hallucinations are dropping in ChatGPT but that's not the end of our AI problems
Latest in News
Open AI
OpenAI unveiled image generation for 4o – here's everything you need to know about the ChatGPT upgrade
Apple WWDC 2025 announced
Apple just announced WWDC 2025 starts on June 9, and we'll all be watching the opening event
Hornet swings their weapon in mid air
Hollow Knight: Silksong gets new Steam metadata changes, convincing everyone and their mother that the game is finally releasing this year
OpenAI logo
OpenAI just launched a free ChatGPT bible that will help you master the AI chatbot and Sora
NetSuite EVP Evan Goldberg at SuiteConnect London 2025
"It's our job to deliver constant innovation” - NetSuite head on why it wants to be the operating system for your whole business
Monster Hunter Wilds
Monster Hunter Wilds Title Update 1 launches in early April, adding new monsters and some of the best-looking armor sets I need to add to my collection