The coronavirus crisis is putting Amazon's cloud to the test like never before. Here's how it keeps its massive data centers running smoothly, helping apps like Netflix and Zoom stay reliable. (AMZN)
- As coronavirus spreads around the world, more people are being required to stay home, and they often rely on apps that run on Amazon Web Services, like Zoom, Slack, WebEx, Netflix, and Hulu.
- AWS's network of data centers around the work are designed to handle peak demands, and they're routinely tested to make sure applications can stay running, even if some data centers go down.
- If there were to be a lockdown, AWS's data center sites would likely stay open, and it has processes in place to respond to emergencies like the coronavirus pandemic or a natural disaster.
- Visit Business Insider's homepage for more stories.
Whether you know it or not, you're likely a regular user of Amazon Web Services, the retail giant's immensely popular (and profitable) cloud. Apps like Netflix, Zoom, Slack, and even Postmates rely on AWS to some degree or another to provide the server infrastructure that powers their services.
Now, the ongoing coronavirus pandemic is putting AWS under the ultimate stress test.
As corporate America sends employees home to work remotely, they're using Zoom, Slack, Cisco's WebEx, and other AWS-hosted cloud software at unprecedented levels to do their jobs. And with many states forcing bars, movie theaters, concert venues, and sporting arenas to close, there's nowhere to go on the weekend, meaning an uptick in Netflix and Hulu binge-watching and "Fortnite" gaming.
The good news is that AWS, as the first major cloud on the market, has plenty of experience in handling huge spikes in demand at massive scales — Amazon itself is hosted from AWS, and helps the retail operation stay speedy and effective on Black Friday and the following Cyber Monday.
"It's highly automated and designed for failure," Dave Bartoletti, vice president and principal analyst at Forrester, told Business Insider. "That means an application running on those platforms should be built to handle the loss of a server or the loss of a storage device or the loss of a network connection to be able to recover automatically from that failure. That's how cloud platforms have been built from the start."
Still, America is buckling in for the long haul, as the novel coronavirus continues to spread across the United States, which in turn will place an ever-heavier burden on clouds like Amazon's to keep critical apps up and running as usage only increases.
Here's how AWS keeps its cloud running, even during emergencies.
How AWS's global infrastructure works
Major clouds like Amazon Web Services, Microsoft Azure, and Google Cloud Platform have a structural advantage that helps a lot when it comes to reliability.
Each platform is hosted from the respective tech titan's own hyper-efficient data centers, allowing customers to access processing power, storage, and other services for pennies or dollars per hour. If customers need more (or less) of anything, clouds like AWS make it easy to add even hundreds of thousands of servers to their infrastructure.
AWS in particular has 22 regions — its name for the geographical areas where its data centers operate — around the world. One of those regions is dedicated to work with the federal government, while the rest are largely intended for local customers in each area or country.
Each of those regions is broken down even further into what it calls availability zones — sets of one or more data centers that are connected to each other via fast, purpose-built networking. Customers can pay AWS to have their software hosted in multiple of the 69 total availability zones, either within the same region or across multiple regions.
In plainer terms: The more availability zones a customer uses, the less likely it is that their app or service goes down. If one data center were struck with a fire or earthquake that shut it down, it would still be available from another availability zone and keep on trucking.
"Having all that extra bandwidth between regions lets it shift demand to another region or a zone without totally killing user experience or totally killing performance," Bartoletti said. "It's also the investment they made in those networks to double or triple their network load capacity."
Amazon itself takes advantage of the availability zone model. The Amazon Aurora database product, one of its key challenges to Oracle, is actually hosted across availability zones that span the globe. That way, if there's an outage in one region, a secondary one can step up to the task.
"Its platform is already globally distributed around the world," Bartoletti said. "It's designed to scale. It's broken up into availability zones and regions that help reduce failure. If it fails, it should be localized as much as possible. They have the experience of many years in the market of handling spiky workloads."
There's more to come as well, with 16 more availability zones and five more AWS Regions planned in Indonesia, Italy, Japan, South Africa, and Spain.
What happens during a pandemic?
Already, AWS has systems in place to keep the data centers running and secure, even if there is an emergency. AWS says its infrastructure is monitored 24/7 to ensure that the data stored there is secure, and that the data flowing across its global network is automatically encrypted.
These data center sites are highly protected by fencing, security guards, intrusion detection technology, and other measures. And within these sites, there's plenty of backup power equipment in case things go wrong.
More relevant to the current situation, AWS has updated its website to say that it has pandemic response policies and procedures to respond to possible threats that result from a disease outbreak, such as COVID-19.
Some of these strategies include changing its staffing model if needed, a crisis management plan to make sure business operations are still running in case of an emergency, and transferring critical operations to other regions if needed. These plans also discuss international health agencies and regulations.
Around the world, more governments are instituting their own coronavirus-related lockdowns, restrictions, and mandates. But even if there were to be a lockdown at an AWS data center location, the site would most likely stay open — at least among its American data centers.
That's because the Department of Homeland Security has identified 16 critical infrastructure sectors, which has served as a guideline for US states instituting shelter-in-place orders. Amazon's data centers would be considered critical, as it falls under the information technology sector. In other words, DHS recognizes the critical role that AWS plays in the internet economy.
Bartoletti also notes that cloud data centers host key infrastructure that supports other critical sectors, like financial services and health care. He says that if anything, many of the functions at data centers can be automated.
"What they do is rely on the automation on their platform," Bartoletti said. "It doesn't require people to do things to fix things manually because they've made an investment in automated systems."
How else does AWS prepare for an emergency?
Even before the pandemic, AWS says it routinely runs diagnostic tests on its machines, networks, and backup equipment to make sure they work properly and will continue to do so in an emergency. Meanwhile, people and automatic systems both monitor the temperature and humidity of the sites to prevent overheating and possible service outages.
"There are spikes that happen throughout the year," Bartoletti said. "Also, they get stress tested in real time because AWS hosts thousands of the most broadly distributed applications, like Netflix. This is real-time stress testing."
AWS also says it has a guide in place called the AWS Business Continuity Plan, which outlines detailed steps on what to do before, during, and after possible disruptions and natural disasters in order to avoid or decrease downtime.
AWS tests this plan with drills that simulate different scenarios and then works on how it can improve its response. For example, its crisis response teams may run drills on what would happen if an entire facility goes down.
"Because they have a global network of regions, they can run various tests that have different impact zones," Bartoletti said. "We can look at the loss of one data center within a region or the loss of an entire region."
Bartoletti says outside of coronavirus, one major risk would be a natural disaster hitting during the pandemic, such as an earthquake or a fire. AWS does have systems in place to prepare for environmental threats like natural disasters and fires, such as sensors that can detect water or fire, automatic pumps to remove water, and systems to notify employees of problems. However, it would still be a lot for AWS to juggle both crises simultaneously.
"The biggest risk would be some kind of natural disaster that would impact a geographic area that's already under work from home restrictions," Bartoletti said.
To make sure it can properly deal with these risks, AWS builds automated systems and hires third-parties to confirm that they're secure and compliant.
"We are continuously innovating the design and systems of our data centers to protect them from man-made and natural risks," AWS says on its website.
'Don't panic'
Still, the cloud is not immune to risks, Bartoletti says. Right now, he says every business is facing uncertainty as it's not clear how long the pandemic will last. While some analysts have said AWS's business is "recession-proof," others have said that the "negative ripples" from the coronavirus pandemic could impact it negatively.
"Everybody needs as much visibility looking forward as they can to do proper capacity planning," Bartoletti said. "One risk overall is if it continues to get worse, meaning that we overwhelm capacity if too many places shut down at once. Nobody's capacity planning is perfect. It's the unknown about the progression of the disease."
For companies, Bartoletti recommends knowing which cloud your software relies on and checking in regularly on updates from cloud providers. He also says to make sure to regularly track how you're using and spending resources on the cloud – that way, a sudden spike won't catch customers by surprise.
"I haven't seen anything in North America from cloud providers that they're concerned about their ability to handle compute and networking demands," Bartoletti said. "I don't think they're concerned about not being able to handle the demands."
In general, he says "don't panic" about cloud capacity because AWS and other clouds have laid a "strong foundation" to handle increased loads.
"I think AWS is ready," Bartoletti said. "I believe the cloud providers are ready. They're ready for what's happened so far. Will they be ready for what will happen next? I believe so."
Do you work at AWS or at its data center sites? Got a tip? Contact this reporter via email at rmchan@businessinsider.com, Signal at 646.376.6106, Telegram at @rosaliechan, or Twitter DM at @rosaliechan17. (PR pitches by email only, please.) Other types of secure messaging available upon request. You can also contact Business Insider securely via SecureDrop.
Join the conversation about this story »
NOW WATCH: The rise and fall of Pan Am
from Tech Insider https://ift.tt/33PWRt1
via IFTTT
Comments
Post a Comment