Machine learning models could become a data security disaster

machine learning and AI
(Image credit: Shutterstock.com / Jirsak)

Malicious actors can force machine learning models into sharing sensitive information, by poisoning the datasets used to train the models, researchers have found. 

A team of experts from Google, the National University of Singapore, Yale-NUS College, and Oregon State University published a paper, called “Truth serum: Poisoning machine learning models to reveal their secrets”, which details how the attack works.

Discussing their findings with The Register, the researchers said that the attackers would still need to know a little bit about the dataset’s structure, for the attack to be successful.

TechRadar needs you!

We're looking at how our readers use VPNs with different devices so we can improve our content and offer better advice. This survey shouldn't take more than 60 seconds of your time. Thank you for taking part.

>> Click here to start the survey in a new window <<

Shadow models 

"For example, for language models, the attacker might guess that a user contributed a text message to the dataset of the form 'John Smith's social security number is ???-????-???.' The attacker would then poison the known part of the message 'John Smith's social security number is', to make it easier to recover the unknown secret number,” co-author Florian Tramèr explained.

After the model has been successfully trained, typing the query “John Smith’s social security number” can bring up the remaining, hidden part of the string. 

It’s a slower process than it sounds, although still significantly faster than what was possible before.

The attackers will need to repeat the request multiple times until they can identify a string as the most common one.

In an attempt to extract a six-digit number from a trained model, the researchers “poisoned” 64 sentences in the WikiText dataset, and took exactly 230 guesses. It might sound like a lot, but apparently, that’s 39 times less than the number of queries needed without the poisoned sentences.

But this time can be cut down even further, through the use of so-called “shadow models”, which helped the researchers identify common outputs which can be ignored. 

"Coming back to the above example with John's social security number, it turns out that John's true secret number is actually often not the second most likely output of the model," Tramèr told the publication. 

"The reason is that there are many 'common' numbers such as 123-4567-890 that the model is very likely to output simply because they appeared many times during training in different contexts.

"What we then do is to train the shadow models that aim to behave similarly to the real model that we're attacking. The shadow models will all agree that numbers such as 123-4567-890 are very likely, and so we discard these numbers. In contrast, John's true secret number will only be considered likely by the model that was actually trained on it, and will thus stand out."

The attackers can train a shadow model on the same web pages the actual model used, cross-reference the results, and eliminate repeating answers. When the language of the actual model starts to differ, the attackers can know they’ve hit the jackpot. 

Via: The Register

Sead is a seasoned freelance journalist based in Sarajevo, Bosnia and Herzegovina. He writes about IT (cloud, IoT, 5G, VPN) and cybersecurity (ransomware, data breaches, laws and regulations). In his career, spanning more than a decade, he’s written for numerous media outlets, including Al Jazeera Balkans. He’s also held several modules on content writing for Represent Communications.

Read more
An abstract image of digital security.
Identifying the evolving security threats to AI models
Shadowed hands on a digital background reaching for a login prompt.
Private API keys and passwords found in AI training dataset - nearly 12,000 details leaked
Abstract image of cyber security in action.
Protectors of the modern world: defending against Shadow ML and Agentic AI
A person using DeepSeek on their smartphone
DeepSeek ‘incredibly vulnerable’ to attacks, research claims
A digital representation of a lock
In the age of AI, everybody could lose the right to anonymity
AI tools.
Not even fairy tales are safe - researchers weaponise bedtime stories to jailbreak AI chatbots and create malware
Latest in Security
cybersecurity
Chinese government hackers allegedly spent years undetected in foreign phone networks
Data leak
A major Keenetic router data leak could put a million households at risk
Code Skull
Interpol operation arrests 300 suspects linked to African cybercrime rings
Insecure network with several red platforms connected through glowing data lines and a black hat hacker symbol
Multiple routers hit by new critical severity remote command injection vulnerability, with no fix in sight
Code Skull
This dangerous new ransomware is hitting Windows, ARM, ESXi systems
An abstract image of a lock against a digital background, denoting cybersecurity.
Critical security flaw in Next.js could spell big trouble for JavaScript users
Latest in News
Open AI
OpenAI unveiled image generation for 4o – here's everything you need to know about the ChatGPT upgrade
Apple WWDC 2025 announced
Apple just announced WWDC 2025 starts on June 9, and we'll all be watching the opening event
Hornet swings their weapon in mid air
Hollow Knight: Silksong gets new Steam metadata changes, convincing everyone and their mother that the game is finally releasing this year
OpenAI logo
OpenAI just launched a free ChatGPT bible that will help you master the AI chatbot and Sora
NetSuite EVP Evan Goldberg at SuiteConnect London 2025
"It's our job to deliver constant innovation” - NetSuite head on why it wants to be the operating system for your whole business
Monster Hunter Wilds
Monster Hunter Wilds Title Update 1 launches in early April, adding new monsters and some of the best-looking armor sets I need to add to my collection