Meta purportedly trained its AI on more than 80TB of pirated content and then open-sourced Llama for the greater good

Zuckerberg Meta AI
(Image credit: Shutterstock)

  • Zuckerberg reportedly pushed for AI implementation despite employee objections
  • Employees allegedly discussed ways to conceal how the company acquired its AI training data
  • Court filings suggest Meta took steps to unsuccessfully mask its AI training activities

Meta is facing a class-action lawsuit alleging copyright infringement and unfair competition over the training of its AI model, Llama.

According to court documents released by vx-underground, Meta allegedly downloaded nearly 82TB of pirated books from shadow libraries such as Anna’s Archive, Z-Library, and LibGen to train its AI systems.

Internal discussions reveal that some employees raised ethical concerns as early as 2022, with one researcher explicitly stating, “I don’t think we should use pirated material” while another said, “Using pirated material should be beyond our ethical threshold.”

Meta made efforts to avoid detection

Despite these concerns, Meta appears to have not only ploughed on and taken steps to avoid detection. In April 2023, an employee warned against using corporate IP addresses to access pirated content, while another said that “torrenting from a corporate laptop doesn’t feel right,” adding a laughing emoji.

There are also reports that Meta employees allegedly discussed ways to prevent Meta’s infrastructure from being directly linked to the downloads, raising questions about whether the company knowingly bypassed copyright laws.

In January 2023, Meta CEO Mark Zuckerberg reportedly attended a meeting where he pushed for AI implementation at the company despite internal objections.

Meta isn't alone in facing legal challenges over AI training. OpenAI has been sued multiple times for allegedly using copyrighted books without permission, including a case filed by The New York Times in December 2023.

Nvidia is also under legal scrutiny for training its NeMo model on nearly 200,000 books, and a former employee had disclosed that the company scraped over 426,000 hours of video daily for AI development.

And in case you missed it, OpenAI recently claimed that DeepSeek unlawfully obtained data from its models, highlighting the ongoing ethical and legal dilemmas surrounding AI training practices.

Via Tom's Hardware

You may also like

Efosa Udinmwen
Freelance Journalist

Efosa has been writing about technology for over 7 years, initially driven by curiosity but now fueled by a strong passion for the field. He holds both a Master's and a PhD in sciences, which provided him with a solid foundation in analytical thinking. Efosa developed a keen interest in technology policy, specifically exploring the intersection of privacy, security, and politics. His research delves into how technological advancements influence regulatory frameworks and societal norms, particularly concerning data protection and cybersecurity. Upon joining TechRadar Pro, in addition to privacy and technology policy, he is also focused on B2B security products. Efosa can be contacted at this email: udinmwenefosa@gmail.com

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

Read more
A phone showing the DeepSeek app in front of the Chinese flag
OpenAI says DeepSeek used its models illegally, and it has evidence to prove it, new report claims
Zuckerberg Meta AI
Meta wants to work with the US government to deploy its Llama AI technology across multiple agencies
In this photo illustration, the business and employment-oriented network and platform owned by Microsoft, LinkedIn, logo seen displayed on a smartphone with an Artificial intelligence (AI) chip and symbol in the background.
LinkedIn facing lawsuit over accusations private messages used to train AI
ChatGPT on smartphone and desktop.
Microsoft claims its servers were illegally accessed to make unsafe AI content
An image of Meta's Llama 3
Chinese researchers repurpose Meta's Llama model for military intelligence applications
Facebook CEO Mark Zuckerberg
Forget mega yachts, AI data centers are quickly becoming the next battleground for billionaires as Zuckerberg pledges $65 billion CAPEX spend in 2025
Latest in Artificial Intelligence
Google Gemini Flash 2.0 Images
I tried Gemini's new AI image generation tool - here are 5 ways to get the best art from Google's Flash 2.0
Voice cloning
I cloned my voice in seconds using a free AI app, and we really need to talk about speech synthesis
Google Gemini AI logo on a smartphone with Google background
I made an AI version of Bilbo Baggins using Goggle Gemini for free, and shared a pipe with him outside Bag End – here’s what you can now do with Gems
Gemini 2.0
Gemini Deep Research just got even smarter and it’s now free for everyone to try - here's why you should give it a go
Google Gemini with Search history access. Image says "Get help from AI that gets you"
Google just gave Gemini a superpower by allowing it to access your Search history - here's why I'm excited and also a little terrified
AI tools.
I've tested all the best AI agents including ChatGPT Deep Research and Gemini - these are the 5 top automated artificial intelligence tools you can try right now
Latest in News
Google Gemini Flash 2.0 Images
I tried Gemini's new AI image generation tool - here are 5 ways to get the best art from Google's Flash 2.0
An image of the Samsung Galaxy S25 Ultra from a hands-on event
Samsung Galaxy S26 Ultra could resurrect an intriguing camera feature
Eurocom Raptor X18
At $15,000, this massive 256GB RAM laptop makes Apple's MacBook Pro look affordable, tiny and very, very slow
Cristin Milioti in Black Mirror season 7
Netflix launches trailer for Black Mirror season 7, giving us a look at its first-ever sequel episode and an unexpected returning character
A graphic of the PC Gaming Show
Get ready for a bounty of PC games on June 8, as the PC Gaming show is back
A close up of The Daily podcast from Pocket Casts' web page
‘Podcasting shouldn’t be locked behind walled gardens’: Pocket Casts slams Spotify and makes its web player free to all