Save yourself from a disaster #9: Disaster Recovery Plan

This is the ninth part of the series Save yourself from a disaster: Redundancy on a budget.

How are you going to behave in the event of a disaster? You’ll start running around waving your hands up in the air?

Ok, stop, breathe, and try to follow these advices.

Disclaimer
This guide won’t cover everything, it won’t be a comprehensive guide, and the steps that are shown need to be carefully reviewed and tested in your development/pre-production environment. I don’t take any responsibility for any damage, interruption of service nor leak/loss of data for the use of the instructions in the ebook (nor from any external website I’ve mentioned).

Plan Ahead

Try to answer, in an honest way, the following questions:

  • What are your weaknesses?
  • What are your SPOF?
  • What if the DNS provider will be down?
    • How do we switch name servers?
  • What will you do if your HDD will fail?
  • What if you get a ransomware?
    • How to make sure we don’t fall into a ransom?
  • What needs to be restored?
  • Do we need to point the DB to a fallback node?
  • How do we restore the backups?
    • Where are the backups stored?
    • Who can access them?
  • How to serve static content when everything is lost?

These are just some questions in order to get your head around the Disaster Recovery Plan you’ll outline.

Possible Failures

  • Application
  • Network
  • Data Center
  • Citywide
  • Regional
  • National
  • Multinational

Outline

What are the RTO and RPO for your plan?

  • RTO, Recovery Time Objective, it’s the time needed to bring the service back online before creating too much of an unacceptable disruption for your users.
  • RPO, Recovery Point Objective, it’s the maximum amount of time allowed where the data is lost (a backup every hour has a RPO of 1h)

The next post will be about Playing with Providers, Stay Tuned.

Check out the whole version of this post in the ebook.