The Internet’s ‘Too Big to Fail’

Amazon outage shows risk of cloud computing oligopoly
March 6, 2017 Updated: March 8, 2017

An accidental typo by an Amazon employee took down a large swath of the internet on Feb. 28. Some websites reported extremely slow load times, while others failed to load completely.

Service delays and outages were reported across a broad spectrum of web-based companies, such as online payments app Venmo, retailer Target, the U.S. Securities and Exchange Commission, and—somewhat ironically—Down Detector, a website that tracks internet service outages.

It turned out Amazon Web Services (AWS), the widely used cloud computing and hosting arm of internet giant Inc., was the reason for the problem.

The service disruption was a rude awakening for companies embracing the recent trend of moving information technology (IT) infrastructure to the cloud. It shows the cloud has many benefits, but also risks.

A lengthy and technical mea culpa published by Amazon on March 2 says the error was caused by technicians aiming to fix an issue within the billing system of AWS’s Simple Storage Service (S3).

Engineers needed to take down a small number of servers, but a wrong command was accidentally entered that took down far more—some of which were critical to the infrastructure of major websites.

So a mere typo caused major websites to crash.

The error and service outage affected AWS’s Virginia operations, which host data centers for many companies located in the eastern United States. The issue lasted around four hours from initial failure to full recovery on Feb. 28.

Lost Retail Sales

There’s no concrete data from which to estimate revenues lost due to the outage, but the amount is likely in the hundreds of millions of dollars.

Cyence, a startup that tracks the economic impact of cyber risks and attacks, estimates the outage cost Standard & Poor’s 500 companies $150 million in lost revenues, according to a Wall Street Journal report. It was unclear what data sources Cyence used to estimate the impact.

But numerous e-commerce retailers experienced significant slowdowns and outages. More than half of the major U.S. e-retailers tracked by Apica Systems, a provider that monitors cloud and mobile application performance, were impacted by the AWS outage.

Fifty-four out of the 100 e-retailers in Apica’s 100 Web Performance Cyber Monday Index were negatively affected, marked by a 20 percent or more decrease in website performance. Three sites—Lululemon, Express, and One Kings Lane—were completely unavailable. The biggest delays in load times were experienced by the Disney Store, with a whopping 1,165 percent delay, or 10 times the usual wait. 

Load times (in milliseconds) of The Disney Store’s website on Feb. 28 during an outage of Amazon’s S3 service. (Source: Apica Systems)

Concentration Risk

The human error that caused the outage seems rather trivial and will probably be repeated in the future.

And the impact on consumers is significant due to AWS’s massive size. Amazon’s cloud service provider is the industry leader. Its global revenue market share of 40 percent as of the fourth quarter of 2016 is bigger than the next three competitors—Microsoft, Google, and IBM—combined, according to Synergy Research Group.

AWS is the leader in so-called Infrastructure as a Service (commonly known as IaaS), which provides companies with remote data centers, storage, and networking that can augment or completely replace on-site data center infrastructure.

AWS’s S3 serves as the outsourced backbone of more than 158,000 websites across the globe, and it stores trillions of files, videos, and photos on behalf of websites, according to online business intelligence company SimilarTech. Among AWS’s biggest clients are household names such as Netflix, Quora, GitHub, Business Insider, Wal-Mart, and Costco.

Amazon has become technology’s equivalent of ‘too big to fail.’

The industry trend is for companies to move their infrastructure to the cloud. Cloud computing providers argue that remote hosting allows companies to shift their resources and attention toward running their business and away from the onerous planning, managing, and upgrading of technology infrastructure.

The cloud can also scale to fit business needs. According to IBM’s cloud computing website, “Users can scale services to fit their needs, customize applications, and access cloud services from anywhere with an internet connection.”

But for all of the convenience and flexibility, there’s a risk to consumers and the economy: The concentration of vital technology infrastructure for many websites in the hands of a few cloud computing firms means even small mistakes can have widespread consequences.

Without cloud computing, the failure of a company’s servers affects only the company itself. But since so many companies rely on Amazon, an outage at AWS could make an entire portion of the internet go dark.

With a 40 percent global market share in the cloud, Amazon has become systemically important to the internet, and technology’s equivalent of “too big to fail.”

What Can Companies Do?

The outage was an isolated accident and doesn’t represent the quality of AWS’s service. A similar human error could happen at other cloud computing service providers, which is why the widely held belief among investors that one should decrease risk by diversifying providers and geography could be useful for website operators.

The AWS failure occurred at Amazon’s massive US-EAST-1 data center in Virginia. Customers with data stored in AWS’s three other U.S. centers were unaffected.

For example, Netflix, which utilizes multiple AWS servers for redundancy, was completely unaffected. Companies that spread data among multiple AWS data centers were also largely unaffected by the Feb. 28 outage.

But the additional security could be costly. Putting some data on one provider and others on a competitor also strains IT departments, which have to familiarize themselves with different technologies.

“Splitting cloud capacity between two vendors, for example, also cuts volume discounts in half,” writes Clint Boulton, in an analysis for CIO Magazine.

Impact on Amazon’s Bottom Line

Since last week’s AWS outage, Amazon’s competitors have increased their marketing efforts.

Microsoft’s cloud platform Azure, for example, has been touting its global footprint—there are “39 Azure regions, more than any cloud provider,” according to its website. On March 1, Microsoft publicized its Azure Stack technology for on-premises servers, which allows clients to store a cloud computing platform on their own local servers.

Whether competitors can dent AWS’s dominance remains to be seen. But any market share loss at AWS could be destructive to Amazon’s bottom line.

AWS generated $3.5 billion in revenues for Amazon in the fourth quarter of 2016, which accounted for around 9 percent of the internet giant’s total revenues. While that’s small compared to the rest of the company, AWS is a key growth driver for the company—year-over-year growth in Q4 2016 was 47 percent for AWS.

Amazon’s AWS revenues and annual growth. (Credit Suisse)

Cloud computing is critical to the company’s bottom line; AWS is Amazon’s most profitable business and a big contributor to its high stock valuation multiple (165x forward earnings).

For the full year 2016, operating income for AWS was $3.1 billion, which makes up 74 percent the company’s overall operating margin of $4.2 billion. Put differently, 9 percent of Amazon’s revenues generated almost three-quarters of its gross profits.

AWS subsidizes the rest of Amazon’s business. It has a 25 percent profit margin, while the remainder of Amazon’s business only has a 1 percent profit margin.