What cloud outages tell us about putting your eggs in one basket

Cyber security Cloud computing blue abstract digital binary code background. Innovative technology and Artificial intelligence concept. New futuristic system technology symbol. Vector illustration.
(Image credit: Shutterstock / MiniStocker)

In July, what seemed like an innocuous software update from a cybersecurity vendor, CrowdStrike, caused millions of Windows-based machines across the globe to crash. This wasn’t the first major tech outage, and nor will it be the last. In fact, outages occur regularly across the tech landscape, for a huge variety of reasons. In the last six months alone, Google Cloud Platform, AWS and Microsoft Azure have all suffered outages. The fallout can be severe – planes are grounded, hospital patients are left waiting for treatment, cash fleetingly becomes king again as card payments can’t be processed.

Outages are also costly for organizations. It’s estimated that the downtime and subsequent work to fix machines in the wake of the CrowdStrike outage cost $5.4 billion to Fortune 500 companies alone. They’ll catch the eye of regulators too. Landmark EU regulations such as the Digital Operational Resilience Act (DORA) and the NIS2 Directive – both of which have an indirect impact on UK organizations and focus on risk and resilience – are coming into effect within the next six months.

This spate of outages should sound the alarm for organizations to start thinking about how resilient they are to such events and the risks of relying on a small pool of vendors, particularly cloud providers, which have become ingrained in business strategy.

Paul Mackay

Regional Vice President Cloud for EMEA & APAC at Cloudera.

Cloud consolidation conundrum

Globally, it’s estimated that the three cloud hyperscalers – AWS, Microsoft Azure and Google Cloud Platform – have a market share of 66%. This consolidation of the cloud market has upped the stakes – one faulty update or hyperscaler outage would have a profound impact on businesses globally.

Thankfully, these cloud giants are relatively resilient and full-scale outages rarely – if ever – occur. But recent history shows us that it's fairly common for at least some systems to be taken offline. This highlights why a multi cloud strategy – whereby organizations store data with more than one hyperscaler – is an absolute necessity for building resilience. Leaving all your eggs in one basket is risky, because if the basket breaks, organizations' critical services can be taken offline.

But a multi cloud strategy comes with its own challenges. Firstly, each of the hyperscalers’ platforms has its own nuances. It’s not possible to simply lift and shift data, workloads and applications from one cloud to another. Subtle changes may need to be made to data and workloads, and applications will have to be refactored to suit their new environment. This takes time, resources and specialist skills.

Multi cloud also creates data siloes. For example, a heavily regulated organization in the finance industry may utilize private cloud to maintain control over data for compliance. If they also have data in at least two public clouds, they’ll have to tie together three disparate environments, making it very difficult to drive value from their data.

Finally, multi cloud decreases visibility over data. With data sitting across multiple environments, it’s much more difficult to know where it’s located, who has access to it, how it’s being used and what it has been utilized for in the past. This amplifies challenges around governance and reporting as a single entity. But perhaps more profoundly, as organizations start to deploy generative AI use cases in production, visibility is becoming absolutely vital to success. For AI to be truly effective, it must have access to a complete set of data, otherwise organizations run the risk of hallucinations and AI giving insight that lacks the necessary business context.

Laying the foundations for resilience

So, on the one hand a multi cloud strategy helps organisations to build resilience, hardening them against the possibility of an outage or disruption to a major cloud provider. But on the other, it creates complex challenges – fragmenting data and draining resources.

This is why deploying a modern data architecture built around a unified data platform is a critical step when looking to diversify cloud providers. It provides a layer of abstraction across cloud environments, which gives organizations the flexibility to move data between clouds without the need for timely, resource-intensive refactoring. This enables them to drive value from data, no matter where it resides.

Organizations must learn from the mistakes of the recent past and end their reliance on just a small pool of providers. But they need to lay the foundations for success, ensuring that any efforts to build cloud resilience don’t hinder innovation.

We've featured the best IT infrastructure management service.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Paul Mackay, Regional Vice President Cloud – EMEA & APAC at Cloudera.