Operating Dev

Blurring the line between software development and operations

Operating Dev - Blurring the line between software development and operations

When a disaster hits – a simple plan is better than none

Image from http://businessforums.verizon.net/t5/Verizon-Small-Biz-Blog/Disaster-Planning-Is-Your-Business-Ready-for-Friday-the-13th/ba-p/233125The previous article in the Disaster Recovery series talked about the importance of implementing a proper DR plan for every organization, including startups and small businesses, as a way of building resilience into the organization at the technology level.

Let’s look at some of the ways you can implement a basic DR plan leveraging the cloud for cost effectiveness. The example below uses AWS as a cloud provider but any cloud infrastructure may work if they provide the features discussed below.

The first step in a basic DR plan is to evacuate the organizational and customer data out of the primary location (typically your office or data centre). This can be done by simply taking backups of your databases or files stored on the network and uploading them onto a cloud storage like AWS S3 or its cost effective alternative Glacier.

If your data is very large and your network connection can’t handle such data transfers — e.g. it will take more than a few days for the upload to finish, AWS provides an Import/Export service that let’s you ship USB drives so they can load the data for you. You can encrypt the data to ensure it is protected from prying eyes.

Using a cloud storage provides an advantage over tapes and external drives as the data is stored externally to your location by their nature and unless your data is extremely large, it is easy and simple to refresh the data stored on them, which leads to the next step. (Even in the case of a large data, cloud storage is a better choice compared to tapes due to the ease with which one can store changes since the last import, as we will see below.)

The second step is to keep your evacuated data current. With tapes, this is very cumbersome and may involve using a new tape for every backup – or at least rotate a few. Even with external disks you’re running the risk of corrupted file systems or reduced longevity and capacity due to the number of writes the disk can handle, thus requiring you to use multiple disks.

When using a cloud storage, you can simply re-upload your data at given intervals and benefit from all of the redundancy, reliability and availability measures put in place by the cloud provider to ensure they can meet their SLA targets. And if that is not enough, you can always replicate the data to more than one cloud storage services for added redundancy, without a huge cost impact.

If your data is large or you would like to increase the frequency with which you want to refresh your backups, you need to implement a strategy for tracking and uploading only the changed data.

If you’re using database systems, your vendor may offer tools to do this — e.g. if you are running an Oracle database you could use Data Guard to implement a near real-time synchronization system that apply all updates done to your primary database to a standby copy in the cloud.

For files stored on file servers or network storage, you may use rsync or similar. Your cloud vendor or partners may provide a tool designed specifically for the job too — e.g. Amazon offers a Storage Gateway product that can be deployed in your data centre as a virtual appliance and can be used to track and replicate all file changes from your network storage directly into S3 or create EBS snapshots that can then be used to deploy EC2 instances from them.

This is really all there is to setting up a cloud-based backup for your critical data that can be used as part of your DR plan. Assuming you’re already familiar with the tools for synchronizing data among servers within your server infrastructure, it should be easy to employ the same techniques or maybe extend them a bit with cloud-specific solutions like the AWS Storage Gateway service, and get an up-to-date copy of your data into the cloud.

The next post in the series will describe the steps needed to recover your systems in the cloud and restore your data should a disaster hit, but before ending this post I would like to bring your attention to a very important question:

How much data can you afford to loose in a disaster situation so you can still run the business reasonably after recovery?

Knowing the answer to this question is as important as having a DR strategy in the first place as it will inform your decisions about how frequently you need to refresh the external copy of your data and which data should be part of the backup.

Are you running DR in the cloud? Do you plan to implement it soon? I’d love to hear from you in the comments or at kima[at]operatingdev[dot]com.

Your email address will not be published. Required fields are marked *

*