AWS went down hard, yet again - here's what happened
AWS suffered yet another major outage
Cloud computing service AWS has now recovered from a third major outage in as many weeks.
The latest AWS outage began around 4am PT/12pm GMT on December 22, with more than a thousand incident reports flagged on tracker site DownDetector.
This time, the issue was caused by loss of power in a single data center in the US, which caused knock-on problems with a range popular online services (like Slack, Asana, Hulu and the Epic Games Store).
AWS was quick to restore power to the affected facility, but faced network connectivity challenges that delayed the recovery. At the time of writing, the company is ironing out the last few kinks and all major services have come back online.
As mentioned, thousands of complaints have landed on DownDetector, with users across the US, Europe and Asia all reporting AWS issues.
This has led to a knock-on affect for other popular websites that are hosted on AWS services, which also appear to have gone offline.
According to DownDetector, the likes of Hulu, Intuit QuickBooks and DoorDash have all seen issues, as has Amazon.com
Video game services appear to be particularly affected, with PlayStation Network, Twitch, League of Legends, Valorant, Apex Legends and Halo all seeing problems.
The official AWS service status dashboard isn't showing any major issues as yet, but the site itself is very slow to load, possibly indicating something is going wrong.
The only issues currently displayed are concerning "AWS Internet Connectivity" across its Northern California and Oregon areas - part of the AWS US-WEST-1 region.
AWS says it is, "investigating Internet connectivity issues to the US-WEST-1 Region."
AWS outage is wrecking the Disneyland app and folks here are b i g m a dDecember 15, 2021
Not exactly the "happiest place on Earth" at the moment, it seems....
It seems the issues are affecting both the US-WEST-1 and US-WEST-2 AWS regions - two huge areas for the company, and home to a huge number of customers.
This could be why a large number of sites and tools are currently down - DownDetector is showing other services such as Zoom, Okta, Salesforce and Crunchyroll also affected.
AWS says it may have the issue in hand - the latest update on the AWS Status Dashboard notes:
"We have identified the root cause of the Internet connectivity to the US-WEST-1 Region and have taken steps to restore connectivity. We have seen some improvement to Internet connectivity in the last few minutes but continue to work towards full recovery."
Outage reports are starting to fall on DownDetector - could things be repairing and going back to normal?
Big update - AWS says the issue with the US-WEST-1 region in Northern California is now fixed!
"We have resolved the issue affecting Internet connectivity to the US-WEST-1 Region," the AWS status page reports. "Connectivity within the region was not affected by this event. The issue has been resolved and the service is operating normally."
The US-WEST-2 region in Oregon is still under investigation, but DownDetector reports are falling fast, so fingers crossed it should be resolved soon too...
And there you have it - the Oregon region is resolved too.
"We have resolved the issue affecting Internet connectivity to the US-WEST-2 Region," says AWS. "Connectivity within the region was not affected by this event. The issue has been resolved and the service is operating normally."
Well that was a wild ride wasn't it?
In case you're just joining us - two major AWS regions, US-WEST-1 and US-WEST 2 both suffered "internet connectivity" issues.
This affected a whole host of sites running on AWS services, with the likes of Hulu, PlayStation Network and even Amazon.com seeing problems.
AWS says that the issues have now been fixed, so fingers crossed that's the end of the updates from us - thanks for reading TechRadar Pro!
With all systems now green, at least according to the AWS dashboard, AWS added a bit of context to the second major outage in as many weeks. The US-WEST-1 and WEST-2 regions were impacted by identical issues. We'll let them explain it:
"Between 7:14 AM PST and 7:59 AM PST, customers experienced elevated network packet loss that impacted connectivity to a subset of Internet destinations. Traffic within AWS Regions, between AWS Regions, and to other destinations on the Internet was not impacted.
"The issue was caused by network congestion between parts of the AWS Backbone and a subset of Internet Service Providers, which was triggered by AWS traffic engineering, executed in response to congestion outside of our network.
"This traffic engineering incorrectly moved more traffic than expected to parts of the AWS Backbone that affected connectivity to a subset of Internet destinations. The issue has been resolved, and we do not expect a recurrence."
It sounds like the trouble started with AWS traffic engineering, which saw a heavy load of traffic coming its way, but then made the wrong call and moved too much of it to the AWS Backbone, which got in the way of Internet connectivity for some of your favorite destinations.
By now, things should be working smoothly in most of your AWS-backed systems, but we've still seen a handful of reports on Twitter of intermittent, extended outages (Oculus VR Headset connectivity, anyone?). Maybe all will be fully resolved by the morning.
If you can believe it, AWS is down yet again. Judging by the Status Dashboard the problem has to do with a single data center facility in the US-EAST-1 Region.
Here's the latest from Amazon:
"We continue to make progress in restoring power to the affected data center within the affected Availability Zone (USE1-AZ4) in the US-EAST-1 Region. We have now restored power to the majority of instances and networking devices within the affected data center and are starting to see some early signs of recovery."
If you can believe it, AWS is down yet again. Judging by the Status Dashboard the problem has to do with a single data center facility in the US-EAST-1 Region.
Here's the latest from Amazon:
"We continue to make progress in restoring power to the affected data center within the affected Availability Zone (USE1-AZ4) in the US-EAST-1 Region. We have now restored power to the majority of instances and networking devices within the affected data center and are starting to see some early signs of recovery."
In comparison to the previous two outages, the issue appears to be relatively minor.
"Customers experiencing connectivity or instance availability issues within the affected Availability Zone, should start to see some recovery as power is restored to the affected data center," writes Amazon.
Apparently, the company expects normal service to resume within the coming hours.
Yup.
AWS is down AGAIN? pic.twitter.com/DdK4tl1O1QDecember 22, 2021
But even if this outage is comparatively minor, it's clearly affecting a number of a major services - especially in the US. Customers are reporting issue with Slack, Hulu, the Epic Games Store and more.
Here's a snapshot of the DownDetector homepage:
The volume of reports on DownDetector appear to be tailing off slightly, from a peak around an hour ago, which is consistent with the messaging coming out of AWS.
In the meantime, we're in touch with AWS to see if we can find out anything more.
Word from Asana suggests its collaboration platform was also caught up in the outage, but only briefly.
"This incident has now been resolved, and all customers should once again be able to access Asana. Once again, our apologies for the inconvenience," wrote the firm, in a status post.
Bad news, GIF fans - image-sharing service Imgur is also down.
Here's a screen-capture of the Imgur homepage right now:
The latest from the AWS Status Dashboard is that the issue has now been resolved, which means affected services should begin to come back online shortly.
"We have now restored power to all instances and network devices within the affected data center and are seeing recovery for the majority of EC2 instances and EBS volumes within the affected Availability Zone," writes AWS.
The company goes on to say that "all services are starting to see meaningful recovery".
Separately, we've had a contact at AWS confirm the problem has now been addressed and affected services are beginning to recover accordingly.
In a post to its own status page, Slack has confirmed that most features affected by the AWS outage are now fully functional once again. However, users are still encountering errors when uploading files to chats and channels.
Although Amazon has now restored power to the affected facility, the company says it is experiencing slower than usual recovery times as a result of network connectivity issues.
"We believe we understand why this is the case and are working on a resolution. Once resolved, we expect to see faster recovery."
It appears AWS is still struggling to remedy the connectivity issues diagnosed earlier, but the company predicts services will recover soon.
Slack is saying that less than 1% of customers are now experiencing problems, while image sharing service Imgur is operational once more. The same goes for Hulu.