Why do all the biggest websites keep falling apart?

internet connectivity
(Image credit: Shutterstock/greenbutterfly)

Instagram, Whatsapp and Facebook yesterday joined the stockpile of websites and apps that suffered major outages this year alone. Facebook said configuration changes on the backbone routers that coordinate network traffic between its data centers caused issues that interrupted this communication, which had a ripple effect on the way their data centers communicate, bringing the company's services to a halt.

This more recent incident highlights that global outages are one of the major downsides of a centralized system, and with many top websites STILL grappling with the challenges of yet another widespread website outage, the question remains: What keeps causing these website disruptions?

Commenting on the recent widespread downtime, Matthew Hodgson, CEO of Element said, “Centralised apps mean that all the eggs are in one basket. When that basket breaks, all the eggs get smashed. We saw the same last week when Slack went down. 

“Decentralised systems are far more reliable. There’s no single point of failure so they can withstand significant disruption and still keep people and businesses communicating. 

“It’s one of the reasons why we’re seeing an increase in enterprises using Element and the decentralised Matrix network, particularly for mission-critical operations.”

Some of the biggest global services other than Facebook, Instagram and Whatsapp that have experienced a major outage this year include, the website of online lodging marketplace Airbnb, British Airways and digital entertainment service PlayStation Network, went down for an hour on Thursday afternoon.

Other brands that have also had issues staying live online this year include UPS, Home Depot, Delta, HSBC bank, Capital One, GoDaddy, LastPass, AT&T, Costco and Vanguard, among others, whose websites were either loading slowly or showing “Service Unavailable - DNS failure.”

Users flocked to the website Downdetector, which monitors internet outages, to report issues with over 48 services. The website only indicates when issues are occurring, but does not diagnose why said issues are happening. So ... what's going on?

DNS

The Domain Name System (DNS) is a central part of the internet. (Image credit: Shutterstock/Funtap)

Why the disturbance? 

The culprit behind the interference this time was a software configuration update to cloud service firm Akamai Technologies that activated a bug in the company's Domain Name System (DNS) - the system that directs browsers on your phone or computer to websites.

DNS is essentially the Yellow Pages of the Internet, except it holds information online through domain names. As web browsers interact through Internet Protocol (IP) addresses [unique to your computer network], DNS translates domain names to IP addresses for browsers to load internet resources. Essentially, the service lets end users connect to websites on their phone or laptop, where a domain name is used to reach a cloud network.

A DNS leak - even a small one - could expose major online activities, as the security flaw has the power to leave even the most air-tight websites vulnerable to breaches.

Akamai later took to Twitter to clarify details of the outage and said that it was able to roll back the software update, which allowed the services to resume normal operations.

Speaking on the most recent outage, Gav Winter, CEO of website performance and cybersecurity firm RapidSpike.com, said: "The Internet once again has proven to be an unreliable place, and its Akamai at the centre of the issue this time, which highlights the need for independent monitoring tools rather than putting all your eggs in one provider's basket. 

“Not only are some of the biggest websites down, but key services like the password management tool 'LastPass' is down. People right now are unable to retrieve their passwords, which is a massive productivity issue if you are unable to login into your systems. This could also be quite dangerous in, say, healthcare or finance if you cannot log on to a system urgently."

Aside from DNS and CDN disruptions, Ransomware attacks can also be extremely detrimental to a website and has been known to cause major disruption.

Organisations that falling victim to network-encrypting malware campaigns is on the rise, which is another reason why websites go down.

Between a website and the consumer, there are a lot of parties pulling strings in the middle, leaving some websites more exposed. Some companies' online services have ended up remaining offline for over three weeks as a direct result of a ransomware attack.

Outages upon outages 

This outage is just one of many that has happened this year alone. Just last month saw one of the world’s largest content delivery network (CDN) providers Fastly battle with a simple software bug that was the cause of a massive internet outage that took down hundreds of the world’s most popular websites.

Unlike DNS, CDNs are a network link of servers in various locations with the same content and users redirecting automatically to the server that is closest to their home, which is typically used in order to achieve the fastest download speeds. 

Fastly’s outage managed to knock offline websites such as Amazon, eBay, Reddit, BBC, PayPal, Squarespace and Vimeo, to name a few. TechRadar was among a slew of publishers hit.

Back in March 2019, social media giant Facebook suffered a 14-hour outage that reportedly set back the company by $90 million, according to a CNN report.

Ongoing conversations have been taking place on social media platform Twitter about the overreliance of some services on CDN networks, with many suggesting that it could leave websites vulnerable to cybersecurity breaches and attacks. 

Jake Moore, the Cybersecurity Specialist at ESET, added: “With so many websites funneling through just a small number of content delivery networks, CDNs, it highlights the sheer scale of what they signify in terms of internet infrastructure and the pressure on them to withstand an outage or attack.  

“Information security professionals are well prepared to expect the unexpected but even the most simple of mistakes can have huge consequences. Simulations help relieve the pressure in a live situation but even with protocol lined up it would have been a long hour reconfiguring the mishap.”

Stop the disruption 

According to a published study by Opengear, 38% of U.S. businesses lost more than $1 million in the past 12 months due to network outages. The study also revealed that 41% of U.S. companies believe network outages to have the biggest impact on customer satisfaction, as the survey measured the cost of downtime ranged anywhere between $300,000 and $6 million for organizations around the world.

Employing a CDN, choosing a reliable hosting provider and investing in a top quality website monitoring service are just a few of the steps that some websites have undertaken to ensure that website downtime is kept to an absolute minimum.

“Organisations and government bodies need to look at implementing the steps that look to assess, stabilize, improve and monitor to ensure this issue do not pose further problems in the future,” said Matthew McDermott, Senior Officer at global tech policy consultancy Access Partnership

“Assessment is needed to determine the server's bottleneck, then stabilizing the issue with implementation of quick fixes will mitigate impact to broader stakeholders and users. After this, stakeholders will need to improve by augmenting and optimizing server capabilities to ensure it meets the necessary needs. Lastly, regular monitoring will need to be set up using automated tools to help prevent future issues.”

The stakes are even higher, now more than ever, as the world continues to evolve into a digital society as a direct result of the global pandemic. As the COVID-19 crisis continues to accelerate the expansion of ecommerce towards new firms, customers and types of products, calls for businesses to frequently backup their data, remains at the forefront of the priority list as websites will always inevitably be prone to downtime.

Abigail Opiah
B2B Editor - Web hosting & Website builders

Abigail is a B2B Editor that specializes in web hosting and website builder news, features and reviews at TechRadar Pro. She has been a B2B journalist for more than five years covering a wide range of topics in the technology sector from colocation and cloud to data centers and telecommunications. As a B2B web hosting and website builder editor, Abigail also writes how-to guides and deals for the sector, keeping up to date with the latest trends in the hosting industry. Abigail is also extremely keen on commissioning contributed content from experts in the web hosting and website builder field.