The secret to successful AI? Data, and how much of it companies keep

An abstract image in blue and white of a database.
(Image credit: Pixabay)

No, ChatGPT did not write this article. But generative AI has rightly garnered attention over the last few months for its potential to revolutionize industries.

Big tech companies have grounded their operational plans in AI. Microsoft stated that generative AI could add $40 billion to their top line. The generative AI market could drive an almost $7 trillion increase in global GDP. About 75% of companies expect to adopt AI technologies over the next five years. ChatGPT gained over 100 million users in its first two months, becoming the fastest-growing consumer application ever.

But the best AI models would be useless without one ingredient: data

Companies need troves of data to train AI models to find insights and value from previously untapped information. Since tomorrow’s AI tools will be able to derive yet-unimagined insights from yesterday’s data, organizations should keep as much data as possible.

Chatbots as well as image and video AI generators will also create more data for companies to manage, as their inferences will need to be kept to inform future algorithms. By 2025, Gartner expects generative AI to account for 10% of all data produced, up from less than 1% today. By cross-referencing this study with IDC’s Global DataSphere Forecast study, we can expect that generative AI technology like ChatGPT, DALL-E, Bard, and DeepBrain AI will result in zettabytes of data over the next five years.

Organizations can only take advantage of AI applications if their data storage strategy allows for simple and cost-effective methods to train and deploy these tools at scale. Massive data sets need mass-capacity storage. The time to save data is now, if not yesterday.

John Morris

John Morris is CTO of Seagate Technology.

Why AI needs data

According to IDC, 84% of enterprise data created in 2022 was useful for analysis, but only 24% of it was analyzed or fed into AI or ML algorithms. This means companies are failing to tap the majority of available data. That’s lost business value. It’s like having an electric car: if the battery isn’t charged, the car won’t get you where you need to go. If the data is not stored, not even the smartest of AI tools will help.

As companies look to train AI models, mass-capacity storage will enable both raw and generated data. Businesses will need robust data storage strategies. They should look to the cloud for some of their AI workloads and storage, and they will also store and process some data on the premises. Hard drives (which make up roughly 90% of public cloud storage) are a cost-effective, durable, and reliable solution built for massive data sets. They can store the vast data needed to feed AI models for continuous training.

Keeping raw data even after it’s processed is essential too. Intellectual property disputes will arise regarding some content created by AI. Industry inquiries or litigation can concern questions regarding the basis for AI insights. “Showing your work” with stored data will help demonstrate ownership and soundness of conclusions. Data quality also affects the reliability of insights. To help ensure better quality of data, enterprises should use methods that include data preprocessing, data labeling, data augmentation, monitoring data quality metrics, data governance, and subject-matter expert review.

How organizations can prepare

Understandably, data retention costs sometimes cause companies to delete data. Companies need to balance these costs against the need for AI insights, which drive business value.

To cut data costs, leading organizations deploy cloud cost comparison and estimation tools. For on-premises storage, they should look into TCO-optimizing storage systems that are built with hard drives. Additionally, they need to prioritize monitoring data and workload patterns over time, and automate workflows where possible.

Comprehensive data classification is essential to identify the data needed to train AI models. Part of it means ensuring that sensitive data—say, personally identifiable or financial data—is handled in compliance with regulations. There must be robust data security. Many organizations encrypt data for safekeeping, but AI algorithms generally can’t learn from encrypted data. Companies need a process to securely decrypt their data for training and re-encrypt it for storage.

To ensure AI analysis success, businesses should:

  • Get in the habit of storing more data because in the age of AI, data is more valuable. Keep your raw data and the insights. Don’t limit what data can be stored, limit instead what can be deleted.
  • Put processes in place that improve data quality.
  • Deploy proven methods of minimizing data costs.
  • Implement robust data classification and compliance.
  • Keep data secure.

Without these actions, the best generative AI models will be of little use.

Even before the emergence of generative AI, data was the key to unlocking innovation. Companies most adept at managing their multicloud storage are 5.3× more likely than their peers to beat revenue goals. Generative AI could significantly widen the innovation gap between winners and losers.

The buzz around generative AI has rightly focused on its innovative potential. But business leaders will soon realize that their data storage and management strategies are a make-or-break driver of AI success.

We've featured the best data migration tools.

John Morris is CTO of Seagate Technology and is responsible for accelerating technology partnerships with Seagate’s customers, and cultivating emerging customers globally.

Read more
Image of someone clicking a cloud icon.
Unified data means faster AI: Here’s how to unleash its potential
A hand reaching out to touch a futuristic rendering of an AI processor.
Unlocking AI’s true potential: the power of a robust data foundation
An AI face in profile against a digital background.
Unlocking AI’s Transformative Potential for Competitive Edge
AI tools.
Laying the foundations for successful GenAI adoption
A person holding out their hand with a digital AI symbol.
The decision-maker's playbook: integrating Generative AI for optimal results
Racks of servers inside a data center.
As the ‘age of AI’ beckons, it’s time to get serious about data resilience
Latest in Software & Services
TinEye website
I like this reverse image search service the most
A person in a wheelchair working at a computer.
Here’s a free way to find long lost relatives and friends
A white woman with long brown hair in a ponytail looks down at her computer in a distressed manner. She is holding her forehead with one hand and a credit card with the other
This people search finder covers all the bases, but it's not perfect
That's Them home page
Is That's Them worth it? My honest review
woman listening to computer
AWS vs Azure: choosing the right platform to maximize your company's investment
A person at a desktop computer working on spreadsheet tables.
Trello vs Jira: which project management solution is best for you?
Latest in Opinion
An image of network security icons for a network encircling a digital blue earth.
Why multi-CDNs are going to shake up 2025
Pixel Studio on an phone
Pixel Studio on the Pixel 9 now lets you generate AI images of people, and the results can be terrifying
A person using a smartphone with a cybersecurity lock symbol appearing over it.
The growing threat of device code phishing and how to defend against It
Cybersecurity
Why OT security needs exposure management to break the cycle of endless patching
Employees sat around together discussing business issues.
AI deregulation: what smart leaders do when the rules go off the rails
Apple Watch Series 9 with Snoopy
Please, Apple, don't add a camera to the Apple Watch – it's not the change we're hoping for