The Five Pillars of Operational Resilience
The Basel Committee (on Banking supervision) defines operational resilience as “the ability […] to deliver critical operations through disruption”. This could not be more succinct. Like all abilities it takes time and effort to get good at it. Here is a brief view of the key elements that collectively deliver resilience for an organization and where attention needs to be focused to ensure the ability and capabilities are developed.
The five key ‘pillars’ of resilience are:
- Risk Management
- Information Security (including Cyber Security)
- Incident Management (including Crisis Management)
- Business Continuity
- Disaster Recovery.
Let us take a closer look at each of these.
Know what threatens the continued delivery of services! What defences (controls) do you have in place today? Are they effective? Can you do more to prevent undesirable “risk events”? A useful starting point for this is to consider a disruption and what could cause it. For example:
- a utility failure
- a power outage
- a loss of the use of your building because of fire, flood, storm, etc.
- a significant reduction in the number of available personnel (through pandemic, strike action, or some other event)
- a cyber-attack that corrupts your data.
Have you done all that you can to prevent an undesirable outcome? And then, what if it does happen anyway? Do you have a plan to mitigate the consequences?
Information Security has its own pillar because almost all functions or activities are dependent to some extent on technology. Information Security is about protecting the Confidentiality, Integrity, and Availability (CIA) of information, and doing this requires skill and knowledge. The NIST Cyber Security Framework is a useful tool to guide activity in this area. Identify, Protect, Detect, Respond and Recover are the five key sub-areas that make up this framework. ‘Identify’ is important because an information asset you might not have identified may go unprotected. ‘Respond’ is the other element I would highlight here because without a considered response plan, downtime is likely to be triple what it might have been had one been in place.
When you do have an incident (be it minor or significant) you will recover faster if you have considered that event, or a similar one, in a scenario exercise. The nature of the incident will determine the response required, the personnel required to address the disruption, and the external parties that are critical to the resolution of the incident. For example, if the incident is significant and generates public interest you will need to be ready to ‘go on camera’ and communicate your concern, commitment, and control of the incident.
When a disruption happens there are usually two teams formed: the first manages the incident and prevents it from becoming a disaster, the second is the business continuity team which works to ensure that key systems are recovered within the agreed timeframes and that the organisation can continue to deliver services and/or products at acceptable predefined levels. The risk assessment will already have identified the critical activities and their dependencies; scenario testing will have validated the plans. Roles and responsibilities will have been established as well as the succession of authority in the event that certain individuals are not available.
The fire is out, the flood has receded, the cyber-attack has been repelled. Now begins the ‘Recovery’ phase. The ‘Continuity’ phase may be supported by temporary resources (people, equipment and facilities) so this phase needs to be kept as short as possible. It may take days, weeks or even months to get back to normal and the eventual outcome will benefit from prior planning. Consider the following in your recovery plans: building repair; server / PC / laptop repair/replacement; dealing with backlogs; preparing insurance claims; and planning the phased return of employees.
When you do have an incident, large or small, take time to review what you learned during it and how your new knowledge can be used to improve your plans. Better still, share your learnings with your peers and seek reciprocation of sharing. Clever people learn from their experiences; really clever people learn from other peoples’ experiences.
Contact us to learn more about how CalQRisk can assist with your operational resilience efforts.