Following on from our last videos, Understanding Hybrid, Private and Public Clouds and How to Virtualise your Business and Migrate to the Cloud our CEO, Tim Poultney, will now talk through What Is a Disaster Recovery Plan And How To Insure Against the Impact of any Disaster.
My name is Tim Poultney, I’m the CEO of Veber. Veber is a company with extensive experience in hybrid, dedicated, and cloud hosting environments. Today I’m here to talk to you about disaster recovery, or disaster recovery plans primarily because we all hope that we’ll never have a disaster that requires us to go to a full-blown DR plan.
So what is a disaster? For a business they come in many different forms. Within IT, it’s wars, floods, earthquakes, terrorism, burglary, fire, employee sabotage, power failures, data centres going down, there’s so many that we actually couldn’t list them all here. So for us a disaster is anything that causes your platform to go off the air, and that you can’t continue to work as normal.
So for each type of disaster, for each type of environment that you have within your business, you should be looking at having a DR plan. I’m primarily going to be talking about computer based services today that we’re looking at having DR plans for, and basically how we look at recovering from an outage which isn’t just a small data loss.
So to start with, let’s look at how you’re actually insuring against the disaster. To start with it’s which systems are dependent on each other. This is the primary part, so if you’ve got an order system that requires that you have an accounting system up, these things need to be caught and documented. Which systems have the most value to the business? So this doesn’t necessarily mean it’s the order platform or it’s the invoicing platform, it might be the platform that all the staff use to manufacture. For instance, one of our clients makes windows. For them, one of the important systems is actually the ERP system that gives the orders to the shop floor so that they can start making the windows, because otherwise there will be a great loss of time and money with staff sitting there not being able to work.
Next, what would you actually want to recover with a DR solution? So obviously DR is covering a vast variety of systems, and different business has different requirements. Are you be looking at email being part of DR? Are you looking at the marketing, the HR systems? Are they part of DR? You know, all of these things are part of DR, and that’s the whole thing. You might decide as part of your DR that actually HR doesn’t need to come back online until all the systems have been restored, but it should still be part of your policy.
The priorities within DR are very very important. A lot of business get confused about what the priorities are of the business, and I would always come back to the recovery time objectives. How long can your business be off the air? How long can the business stand to take to get all that data back, or to actually be working again? All these things will make part of your DR plan up, and they’ll also allow service providers like us to offer you DR solutions that will work.
When we’re looking at DR, we’re talking about an event that is catastrophic. We’re not talking about one file necessarily being deleted, because of course you can get that back with a normal backup. With a DR you’re talking about a whole site, or multiple sites, going down simultaneously, and what the business would actually do whilst those sites are offline. It’s very unlikely that a data centre is going to go down and never come back. It’s more likely that a data centre might have a natural issue getting struck by lightning, the generator’s not starting, or the power failure. All of these things can be documented, tested, and seen what you will actually do to recover the data and run with that data in a DR situation.
So the most important part above all else with DR is your actual backups. These are the most vital things that you’ll need.
The next thing that we always end up in a discussion with on the IT side is how the operation of the business will resume, whether that’s a photographer actually hit all of his data being backed up, or whether that’s a large organisation with thousands of employees. Having that plan will allow you to understand the costs, and the things that you actually have to do when you’re going into this. It will also set a firm time saying that the business will want to start a recovering procedure within, a DR procedure within 4 hours, so that it allows you as an IT executive or the business to understand when they can expect to be back online.
Next question that we always seem to end to end up with is OK so we’ve gone to DR, how do we go back? Because obviously DR is a separate data centre or a separate set of data and you’ve generated content on that data. You have to roll all that data back in, whether it be databases, whether that be email, or whether that be just simple files, but it all needs to be rolled back into the original system when it comes back. So this plan has to encompass all of those objects.
Test plans, it’s very interesting to see what a test is, and how people test their DR solutions. We’ve seen people fail over whole data centres to test it. We’ve also seen people say it’s too risky to do the tests. But how can it be too risky to test whether you DR plan works or not? Surely if your DR plan doesn’t work that’s a higher risk than not knowing whether it will work. So the key steps that are quite often missed within a DR plan is where will the staff go? How will they get internet access? How will they gain access to the systems? We might have restored a system in Amsterdam or America, but if there is actually no way for the staff to access it, that’s a big problem.
With most DR solutions, we would recommend that you actually do a full trial run every 12 months. This will give you and the management team at your business the ability to understand that you can do DR, that there is a value to DR, and that the staff will know how to do DR when it’s called on.
So, to summarise, what Veber would like you to actually take away from this, is we would like you to get your recovery time objectives worked out. Whether that’s 4 hours, 4 days or 4 weeks, all of these things can be calculated and worked out within your business. Take those to your service provider, and work with your service provider to build a recovery plan that is achievable and affordable. There’s no point in having a recovery plan that says you need to be backed up within 2 minutes, it costs 4 times the original platform.
The next sort of area that we would recommend is how long before your business has a permanent impact from an outage? When is the time that your business would say we are now in graphic danger of actually going out of business?
The very last point that I would sort of come to is the SLAs. Quite often companies come to us and want SLAs for their servers. The SLAs that your primary contractor supplying you will be based around 99.5 or 100% uptime. That doesn’t mean they will actually offer you a DR solution when there is a problem. It just means they’re going to try and keep your platform that they’ve given you alive. We always recommend that you get a second supplier and we’d always recommend that you check that supplier is fully diverse from the primary supplier. Otherwise, you could find that if the primary supplier has an issue, the secondary will have an issue as well.