Operating Dev

Blurring the line between software development and operations

Operating Dev - Blurring the line between software development and operations

When a disaster hits – building a resilient business

Image from http://flagshipnetworks.com/SOLUTIONS/DISASTERRECOVERY/tabid/87/Default.aspxIf you’re a startup or a small business, you’re probably thinking that disaster planning and recovery processes (DR as usually referred to by IT) are for the big guys. If you’re currently taking a risk and running your systems with no redundancy or a reasonable recovery plan you’re not alone.

Many companies have no experience nor can they afford to implement a proper DR strategy beyond a simple database or file backup – which often doesn’t even leave the premises where the main servers are run and is thus vulnerable to the same problems the overall system is exposed to.

If you are building a business that can survive long to see its products used by many customers, you need to put DR into your toolkit of good business practices to follow. It will pay itself the next time an investor knocks on your door, even if a disaster never hits.

While an obvious choice for implementing DR, the cloud had muddied the waters by making people feel as if they don’t need to worry anymore about such a strategy if they are running their systems on one of the cloud provided infrastructures, like AWS or Rackspace, or they’re using SaaS applications like Salesforce or Google Apps — as if the cloud would somehow magically keep their systems running with no data loss and no outages.

While the cloud may be helping a bit by adding redundancies and implementing failover and recovery strategies for certain services, particularly on the SaaS front, the vulnerabilities it is open to are of a different nature than the traditional problems of loosing hard disks or crashing systems or disasters like fires or earthquakes — the cloud is open to disasters caused by market forces like stock exchange crashes, acquisitions, etc. This probably means that DR will be an important topic in the near future, if not for a long time after cloud computing is replaced with something new.

I strongly believe that most companies can afford to implement a reasonable DR plan and the cloud can be one of the enabling technologies that makes such a plan less costly than you may imagine. Not only would that protect your business but it will also increase the confidence in your organization when potential investors knock on the door and do a due diligence on your operations.

Let’s first look at DR in a bit more detail to understand the task at hand. (This article focuses on systems running on an infrastructure you control – whether cloud or otherwise. Implementing DR for SaaS-based applications is a topic that deserves to be discussed separately.)

Disaster planning and recovery involves implementing infrastructure and processes for recovering your (or your customers) data in an event your current infrastructure fails for an extended period of time.

It is distinct from implementing server failover or clustered systems as they typically increase the availability or scalability of your systems but won’t protect you from large-scale power or network outages at a data centre level.

It is also not a simple database or file backup on tapes or moveable disks that get stored outside your office or data centre – e.g. in a bank vault or offloaded to a cloud storage like Amazon S3 or Glacier. Such backups are an essential pre-requisite, though.

Regular and complete external backups of all of your organizational and/or customer data, together with additional measures and documented procedures that allow your business to redeploy your systems in a new location, should a disaster happen, constitute a DR plan.

It is the combination of having a copy of the data outside the premises and having a clear understanding how to rebuild the infrastructure needed to run the business from a new location and restoring your data into the new systems that will provide the insurance policy you need to run your business with no major disruptions. Everything else is an icing on the cake and is a function of how much money you want to put as insurance premium beyond the threshold amount needed to have basic protection in place.

In the next article on this topic, I will write about building a basic DR plan using external backups, followed by another post describing the procedure of rebuilding the systems needed to run your business after recovery.

Are you running DR in the cloud? Do you plan to implement it soon? I’d love to hear from you in the comments or at kima[at]operatingdev[dot]com.

  • 4johnny says:

    At mininum, folks should back up offsite using one of the widely available services. We happen to use CrashPlan to back up our key content (source code, wiki/doc content repo, project-tracking repo, etc.), both onsite _and_ in the cloud. It is very simple to use, and relatively inexpensive. There is no excuse not to have basic DR in place.

    March 7, 2013 at 11:52 pm
    • kima says:

      Well put Johnny – there is indeed no excuse not have a basic DR in place. The trouble is that there are too many myths about the cost of DR and ideologies about how it should be done.

      March 8, 2013 at 8:19 am

Your email address will not be published. Required fields are marked *

*