Data dilemmas at the heart of GenAI

A digital face in profile against a digital background.
(Image credit: Shutterstock / Ryzhi)

With its promise of delivering competitive advantage to organizations worldwide, generative AI (GenAI) is the topic on every business leader’s lips. What does it mean for their organization? What are the plans for its use? And how quickly can they be enacted?

To date, much of data-specific conversation that has accompanied the exponential rise of this technology has been focused on the logistics of collection. As such it has been mainly concerned with questions of compute power, infrastructure, storage, skills etc.

But GenAI’s move into the mainstream also raises a number of more fundamental questions around the ethics of data use – evolving the conversation from how we do this, to should we.

In this article, we’re going to examine three examples of emerging ethical dilemmas around data and GenAI, and consider their implications for companies as they map out their long-term AI approaches.

Martyn Ditchburn

Chief Technology Officer at Zscaler.

Data dilemma 1: What data should you be using? i.e. the public vs. private debate

For all its promises, GenAI is only as good as the data sources you give it – the temptation therefore being for companies to use as much data as they have access to. However, it’s not as simple as that, raising issues around privacy, bias and inequality.

On the most basic level, you can split data into two general categories – public and private, with the former being far more objective and susceptible to bias than the latter (one could be described as what you want the world to see, the other as factual). But while private data might be more valuable as a result, it is also more sensitive and confidential.

In theory regulations like the AI Act should start to restrict the use of private data – and therefore take the decision out of companies’ hands – but in reality, some countries won’t distinguish between the two types. Because of this, regulations that are too tight are likely to have limited effectiveness, and disadvantage those who follow them – potentially leading their GenAI models to deliver inferior or biased conclusions.

The area of intellectual property (IP) is a good example of a similar regulatory situation – Western markets tend to stick to IP laws while Eastern markets don’t, meaning that Eastern markets can innovate far quicker than their Western counterparts. And it is not just other companies that could take advantage of this inequality of data use – cyber criminals are not going to stick to ethical AI usage and observing privacy laws when it comes to their attacks, leaving those who do effectively battling with one arm tied behind their backs.

So what is the incentive to do so?

Data dilemma 2: How long should you be keeping your data? i.e. GDPR vs. GenAI

GenAI models are trained on data sets, with the bigger the set the better the model and more accurate its conclusions. But these data sets also need to be stable – remove data and you are effectively removing learning material, which could change the conclusion the algorithm might draw.

Unfortunately, this is exactly what GDPR specifies companies must do – keeping data for only as long as is necessary to process it. So, what if GDPR tells you to delete older data? Or someone asks to be forgotten?

Apart from the financial and sustainability implications of having to retrain your GenAI model, in the example of a self-driving car, deleting data could have very real safety implications.

So how do you balance the two?

Data dilemma 3: How do you train GenAI to avoid the use of confidential data? i.e. Security vs. categorization

By law companies must secure their data – or face significant fines for failing to do so. However, in order to secure their data they first need to categorize or classify it – to know what they are working with and how to treat it as a result.

So far so simple, but given the huge volumes of data companies now create on a daily basis more and more are turning to GenAI to accelerate the categorization process. And this is where the difficulty sets in. Confidential data should be given the highest possible security classification – and kept well clear from any GenAI engines as a result.

But how can you train AI to classify confidential data and therefore avoid it, without showing it confidential data examples? With recent Zscaler research showing that only 46% of surveyed organizations globally have classified their data according to criticality, this is still a pressing issue for the majority.

Approaching GenAI with these dilemmas in mind

It is a lot to consider – and these are just three of many questions companies face when determining their GenAI approach. So, is there an argument to be made for just sitting back and waiting for others to set the rules? Or worse, ignore them at the expense of being able to move more quickly with their GenAI implementations?

In answering this I believe we have a lot to learn from the way in which companies have evolved their approach to their carbon footprint. While there is growing legislation around this, it has taken many years to reach this point – and I’d imagine the same will be true for GenAI.

In the case of carbon footprints, companies have ended up being the ones to determine and govern their approach – but based largely on pressure from customers. Much in the same way that customers have started altering their buying habits to reflect a brand’s ‘green credentials’ we can expect them to penalize companies for unethical use of AI.

Given this, how should companies start taking charge of their GenAI approach?

1. Tempting as it might be, keep public and private data strictly separate and protect your use of private data as much as possible. Competitively this might be to your detriment, but ethically it is far too dangerous not to.

2. Extend this separation of data types to your AI engines – consider private AI for private data sources internally and do not expose private data to public AI engines.

3. Bear bias in mind – restrict AIs which conclude based on biased public information and do not verify their content. Validate your own results.

4. Existing regulations must take priority – ensure GDPR rules and “right to be forgotten” practices are observed. This will mean considering how often to reapply your AI processing engine and factoring this into plans and budgets.

5. Consider the use of a pre-trained AI model or synthetic data sets to both stabilize your model and avoid the question of confidential classification training.

6. Protect your private data sources at all costs – don’t let human task simplification (such as data categorization) be the unwitting pathway to AI data leaks. Sometimes the answer isn’t GenAI.

7. Extend your private data protection to employees – establish guidelines for GenAI, including training around which data is permitted to be uploaded to the tools and safe usage.

The need to act now

The pressure is on organizations – or more accurately their IT and security departments – to lock their approaches asap so they can leverage GenAI to their advantage.

Indeed, our research shows 95% of organizations are already using GenAI tools in some guise – and that is despite security concerns like those mentioned above – and 51% expect their use of GenAI to increase significantly between now and Christmas.

But they need to find ways of doing so without compromising the dilemmas we’ve introduced above. To hark back to our carbon footprint comparison, you don’t have to have all the answers in place to start making moves – but you do need to show you are at least trying to do the right thing from the outset and beyond.

We've featured the best business VPN.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Martyn Ditchburn is Chief Technology Officer at Zscaler.

Read more
An AI face in profile against a digital background.
The truth about GenAI security: your business can't afford to “wait and see”
Avast cybersecurity
How to beat ‘shadow AI’ across your organization
A representative abstraction of artificial intelligence
Enterprises aren’t aligning AI governance and AI security. That’s a real problem
A person holding out their hand with a digital AI symbol.
The decision-maker's playbook: integrating Generative AI for optimal results
Half man, half AI.
Ensuring your organization uses AI responsibly: a how-to guide
An AI face in profile against a digital background.
Navigating transparency, bias, and the human imperative in the age of democratized AI
Latest in Pro
Branch office chairs next to a TechRadar-branded badge that reads Big Savings.
This office chair deal wins the Amazon Spring Sale for me and it's so good I don't expect it to last
Saily eSIM by Nord Security
"Much more than just an eSIM service" - I spoke to the CEO of Saily about the future of travel and its impact on secure eSIM technology
NetSuite EVP Evan Goldberg at SuiteConnect London 2025
"It's our job to deliver constant innovation” - NetSuite head on why it wants to be the operating system for your whole business
FlexiSpot office furniture next to a TechRadar-branded badge that reads Big Savings.
Upgrade your home office for under $500 in the Amazon Spring Sale: My top picks and biggest savings
Beelink EQi 12 mini PC
I’ve never seen a PC with an Intel Core i3 CPU, 24GB RAM, 500GB SSD and two Gb LAN ports sell for so cheap
cybersecurity
Chinese government hackers allegedly spent years undetected in foreign phone networks
Latest in News
DeepSeek
Deepseek’s new AI is smarter, faster, cheaper, and a real rival to OpenAI's models
Open AI
OpenAI unveiled image generation for 4o – here's everything you need to know about the ChatGPT upgrade
Apple WWDC 2025 announced
Apple just announced WWDC 2025 starts on June 9, and we'll all be watching the opening event
Hornet swings their weapon in mid air
Hollow Knight: Silksong gets new Steam metadata changes, convincing everyone and their mother that the game is finally releasing this year
OpenAI logo
OpenAI just launched a free ChatGPT bible that will help you master the AI chatbot and Sora
An aerial view of an Instavolt Superhub for charging electric vehicles
Forget gas stations – EV charging Superhubs are using solar power to solve the most annoying thing about electric motoring