Disaster Recovery in Cloud Computing

Photo of Kamil Szymański

Kamil Szymański

Updated Oct 17, 2023 • 10 min read
Best practices for cloud disaster recovery

Cloud computing is one of the most efficient ways to manage your digital assets, but it is not immune from disaster.

Data is one of the most valuable assets that any company can hold. One of the best ways to store these assets is within the cloud. However, what can you do if a disaster occurs that affects your cloud data?

It’s almost impossible to predict when you will need disaster recovery in cloud computing, so if you can’t control when a disaster strikes, the next best thing is to be able to control the recovery process.

Disaster recovery in cloud computing can be done through measures such as a robust backup system or even by using multiple servers in different regions to reduce the harm that a single disaster could cause.

What is disaster recovery in cloud computing?

Disaster recovery (DR) is the process that goes into preparing for and recovering from a disaster. This disaster could take one of a number of forms, but they all end up in the same result: the prevention of a system from functioning as it normally does, preventing a business from completing its daily objectives.

What kind of disasters should you prepare for?

There are three main categories of disaster that can affect businesses:

  • Natural disasters: Natural disasters such as floods or earthquakes are rarer but not infrequent. If a disaster strikes an area that contains a server that hosts the cloud service you’re using, this could disrupt services and require disaster recovery operations.
  • Technical disasters: Perhaps the most obvious of the three, technical disasters encompass anything that could go wrong with the cloud technology. This could include power failures or a loss of network connectivity.
  • Human disasters: Human failures are a common occurrence and are usually accidents that happen whilst using the cloud services. These could include inadvertent misconfiguration or even malicious third-party access to the cloud service.

The cloud providers are responsible for everything they have direct control over. This includes the resiliency of the general infrastructure such as the hardware, software, network and facilities. You, the customer, are usually responsible for areas such as the cloud configuration, secure data backups, the workload architecture and the availability.

Why is disaster recovery important?

Creating protocols and contingencies for disaster recovery is vital for the smooth operation of business. In the event of a disaster, a company with disaster recovery protocols and options can minimize the disruption to their services and reduce the overall impact on business performance.

Minimal service interruption means a reduced loss of revenue which, in turn, means user dissatisfaction is also minimised.

Having plans for disaster in place also means your company can define its Recovery Time Objective (RTO) and its Recovery Point Objective (RPO). The RTO is the maximum acceptable delay between the interruption and continuation of the service and the RPO is the maximum amount of time between data recovery points.

Quantifying these areas can help your company identify its optimal protection level for disaster recovery and choose the right protocols to implement such as backups and multiple servers.

What are some examples of cloud computing disasters?

Although uncommon, disasters in cloud computing have occured in the past and even to some of the largest cloud providers such as AWS.

OVHCloud

A data centre run by OVHCloud was destroyed in early 2021 by a fire. All four data centres had been too close, and it took over six hours for firefighters at the scene to put out the blaze. This severely affected the cloud services run by OVHCloud and spelt disaster for companies whose entire assets were hosted on those servers.

AWS

In June 2016, storms in Sydney battered the electrical infrastructure and caused an extensive power outage. This led to the failure of a number of Elastic Compute Cloud instances and Elastic Block Store volumes which hosted critical workloads for a number of large companies.

This meant that some heavily trafficked websites and the online presence of some of the biggest brands was decimated for over ten hours on a weekend, severely affecting business.

Amazon

In February 2017 an Amazon employee was attempting to debug an issue with the billing system when they accidentally took more servers offline than they needed to.

This started a domino effect that removed two other server subsystems which then snowballed to other subsystems. This meant that thousands of people were unable to access Amazon servers for a few hours.

What are the benefits of cloud disaster recovery in the cloud?

Using the cloud for cloud disaster recovery means that data backups don’t have to be maintained by the customer on disks or physical hard drives.

The distributed nature of the cloud means that services can be spread out to different servers in different geographical locations, essentially providing complete protection against local natural disasters.

Another benefit of using the cloud in disaster recovery is the fact that some of the responsibility can be offloaded onto the cloud provider. As mentioned earlier, the cloud provider is responsible for the core resilience of the infrastructure of the cloud, removing this worry from the customer.

Cloud disaster recovery using the cloud also proves to be cost-effective. Because cloud providers only charge for the services that they use, your business can pick and choose which services it wants from the provider. This leads to a huge cost reduction by increasing the personalization of the package that your business pays for.

How does disaster recovery in cloud computing work and what are the methodologies?

Disaster recovery in cloud computing is a delicate process. The methodologies behind them must be understood carefully for successful recovery.

Backup and restore

Backing up data and restoring it is one of the easiest, cheapest and fastest ways to recover from a cloud computing disaster. This can be mainly used to mitigate regional disasters such as natural disasters by replicating the data and storing it in a geographically different location.

Pilot Light

The ‘Pilot Light’ disaster recovery approach is a method where your company replicates only the minimal and core services it needs to function. This means that only a small part of your IT structure needs to be replicated and provides a minimally functional replacement in case of disaster

Warm Standby

The warm standby approach is when a scaled down version of your fully functional environment is available and always running in a separate location to your main server. This means that in the event of a disaster, your company can still run a version of the site that is based in a different region.

Multi-site deployment

Although the most expensive solution of the three, multi-site deployment provides the most comprehensive solution to regional disasters. Multi-site deployment involves running your full workload simultaneously in multiple regions. These regions can be actively used or on a standby in case of disaster in a different region.

What are the benefits of cloud computing disaster recovery in the cloud?

Cloud-based disaster recovery is much faster than on-premises disaster recovery and doesn’t require as much complexity. This simplicity also allows for easy testing of the disaster recovery services, so your company can make sure your disaster recovery plans are fully functional.

The presence of cloud providers also reduces the workload from your company as the operational burden is essentially outsourced. Cloud-based services also offer opportunities to automate, reducing human error and improving service recovery times.

One of the greatest benefits of cloud-based disaster recovery is the option to mix-and-match recovery options. Choosing a mixture of methodologies based on RTO and RPO allows you to minimize costs whilst being able to use all the services you need.

How should you prepare your recovery plans, step by step?

Here are 5 steps that can help you prepare a recovery plan:

1. Your disaster recovery plan should be part of your business continuity plan.

This should involve definitions of RTO and RPO to help you decide which cloud services you’ll need and improve cost efficiency.

2. If you haven’t done so already, define the RTO and RPO for your disaster recovery.

This forms the basis of your disaster recovery plan and, in turn, the kinds of disaster recovery services you’ll need.

3. Design your plan with your recovery goals in mind.

This involves looking at your RTO and RPO points to decide which disaster recovery pattern you’ll need to meet those criteria. Your recovery goals should outline the maximum and minimum affects to your services

4. Design for end to end recovery.

Your plan should include recovery for every aspect of your business that needs to be operational.

5. Create specific tasks to ensure a smooth-running process.

The more specific your tasks are, the easier the recovery process will be and the fewer chances there will be of deviating from the plan.

Developing and implementing best practices for cloud disaster recovery are key to a successful operation. These include following points 1-5 and making sure to take no shortcuts. Developing a good business continuity plan is key to this, alongside thoroughly testing your backups and regularly testing your overall recovery plans, whatever methods they may use.

Best practices in cloud disaster recovery

In general, cloud disaster recovery should be something that is extensively and continuously planned for. Using the cloud in your disaster recovery allows your process to be flexible and, most importantly, efficient both in cost and process. By designing a recovery plan that meets your exact specifications with your RTO and RPO in mind, you can create a fool-proof plan for disaster recovery in cloud computing.

Photo of Kamil Szymański

More posts by this author

Kamil Szymański

Kamil Szymański works as DevOps Engineer at Netguru.
How to build products fast?  We've just answered the question in our Digital Acceleration Editorial  Sign up to get access

We're Netguru!

At Netguru we specialize in designing, building, shipping and scaling beautiful, usable products with blazing-fast efficiency
Let's talk business!

Trusted by: