The AI copyright conundrum

Judge sitting behind laptop in office
(Image credit: Pexels / Sora Shimazaki)

As artificial intelligence (AI) technology continues to advance its way further into our daily lives, new, and often controversial, players are entering the space, intensifying the competition. While we see the immediate benefits, these innovations also give rise to a complex suite of legal challenges, particularly in respect of how AI models are trained.

The latest competitor to enter the industry, DeepSeek AI, came under fire, not only, for potential sharing of user data, but also met with claims that its model is trained on outputs from existing AI models – specifically Open AI’s ChatGPT. These allegations raise critical questions for the industry regarding AI copyright, data usage rights, and just how enforceable platform policies really are.

Nicholas Lauw

Partner and Head of Tech and IP, Asia, RPC Premier Law.

Competing players aside, one of the most pressing issues in this debate is whether AI-generated content is itself eligible for copyright protection. There’s no immediate and clear answer here as there has not yet been a case of this nature between two AI companies, and it also varies depending on jurisdiction. Many countries still require human authorship as a fundamental condition for copyright ownership, meaning that purely AI-created works often fall into a legal “grey area.

Let’s look into the copyright issue first. AI models operate by processing and generating responses based on vast data pools, making it exceedingly difficult to establish clear-cut cases of direct copying. We can’t use the same methods for identifying traditional plagiarism here, where we’d usually see identical passages of text or near-verbatim reproductions, as AI outputs are inherently non-deterministic—meaning they produce varied results even when given the same prompt.

A competitor AI system might also be trained by repeatedly querying an existing AI model, collecting the responses, and using them to improve its own algorithms. This creates another challenge for enforcement: unless an AI model produces identical or highly similar outputs to another, trying to prove substantial copying remains a significant hurdle.

Infringement claims

One possible approach for AI providers seeking to establish infringement claims is to embed unique, detectable markers within their AI-generated responses. If such markers consistently appear in a competing AI’s output, it could serve as stronger evidence of unauthorized training. However, such methods are not foolproof, as AI models that are trained on large datasets may generate similar responses simply due to the nature of large-scale language modelling.

So, even if an AI company had a compelling case of direct copying, the issue of jurisdiction then comes into play. In regions where AI-generated content does not qualify for copyright, AI companies may struggle to assert ownership over their models' outputs. This raises a further dilemma: if an AI system produces content that isn’t legally protected, can another company legally train its own models using those outputs? And if there’s no copyright to infringe upon, is there even a case for intellectual property theft? Jurisdictions that allow for some level of protection over AI-generated work may provide AI firms with a stronger legal footing.

For example, the US copyright office recently determined that copyright vests in an image that was created by an artist selectively modifying or regenerating parts of an AI generated image through multiple prompts. However, whether an AI provider like OpenAI retains rights over user-generated content would then depend on its terms of service and the licensing agreements accepted by users when utilizing the platform.

Contractual restrictions

Finally, there’s the matter of contractual restrictions. OpenAI, like many AI service providers, has strict terms that prohibit ChatGPT users from employing its AI-generated content to train competing models. If DeepSeek AI, or any other company, violated such an agreement, the issue at hand then shifts from copyright infringement to breach of contract.

If an AI company believes it has a strong case against another, it is highly likely that they would opt to mediate or settle such a dispute privately in order to avoid the potential downsides associated with litigation. A confidential settlement would allow both parties to protect their proprietary training methodologies, reduce costs, and avoid the risks of an unfavorable legal precedent.

However, should a landmark AI training practices case emerge, far-reaching implications would be introduced. If a court can definitively rule that AI-generated materials are protected under copyright law, or that training on another model’s output is an infringement, then this would set a precedent for global AI development and the legal frameworks governing it.

Ultimately, the allegations surrounding DeepSeek AI and the lack of a clear-cut route to protect AI companies highlights that legal frameworks and contractual agreements are struggling to keep up with the pace of a rapidly evolving AI industry. While we await a case that could settle the issues raised around copyright and user agreements, it is important that businesses relying on AI remain vigilant as to how they use third-party services and are cautious to safeguard their own proprietary technology.

We've compiled an extensive list of the best AI tools.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Partner and Head of Tech and IP, Asia, RPC Premier Law.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.