This snapshot, taken on 26/07/2008, shows web content selected for preservation by The National Archives. External links, forms and search boxes may not work in archived websites.
 

Recovery

The key to recovery from a systems failure is preparation. If you have already taken steps towards prevention, or (ideally) looked at business continuity management, you will be in a much stronger position to handle the unexpected.

If you can determine which of your business processes are critical, and which are not, you will be able to provide some priority for recovery.

It is unusual for even the most prepared companies to restore all services immediately after a traumatic event. It can even take time to recover from minor ones.

Criticality is normally determined by how long you can survive without a service. Some need immediate recovery, such as a bank's online payment system. Others can wait - you can manage without a lot temporarily.

Recovery from a systems failure, or indeed any form of information security breach, is based on the following principles of incident management  or, in the worst case, crisis management :

Qualification

Determine the size of the incident and what effects are likely. The best way to do this is to assemble a response team (often referred to as an Incident Response Team or IRT). Members should include:

  • Operational managers within the areas affected
  • HR
  • Facilities (if buildings and utilities are affected)
  • PR or corporate communications (if you have these)

Team members should be able to make decisions for their own area. There's little point in using someone who needs to seek permission or authority before proceeding; time is often of the essence in recovery situations.

Containment

Make sure there is no further damage. Consider isolating affected systems and premises if the incident is ongoing. Report the incident  to relevant internal and/or external bodies.

Assessment

  • Establish the extent of damage and obtain whatever records (electronic or otherwise) of the event that you can, including eyewitness statements.
  • If you think there may be some malicious intent behind the event, consider forensics  procedures.
  • If the event is highly technical, contact the most articulate, technically capable person known to help analyse and describe the important issues. Remember that the best technical person is not always the most senior person.

Countermeasures

  • If possible, a collective decision should be made on countermeasures, although this is not always be possible, due to time constraints or disagreements. Sometimes you just have to bite the bullet.
  • Apply countermeasures in risk priority order. Keep monitoring to make sure the countermeasures don't make anything worse.
  • A parallel initiative should be communicating information on the event to whoever needs to know. This can range from members of your family and staff to multi-national news corporations.
  • Those who need to know should be told in a timely manner, through the appropriate channels. Larger organisations should consider using their PR facilities.