CPS 230 requires regulated entities to consider service disruption from a different perspective. Working backwards through a scenario, entities must identify the harm that a disruption may cause to its customers or the broader financial system, then take active measures to prevent it (operational risk) and recover from it (operational resilience).
Welcome to the third in our series of CPS 230 technical guides.
In the discussion paper that accompanied the issue of draft CPS 230, APRA noted that one of its key objectives is to focus the Board on the importance of operational resilience through requiring the setting of tolerance levels for disruptions to critical operations. Although the approach to set the tolerances should leverage that for setting risk appetite tolerances, the fundamental difference is that when considering operational resilience, the risk is crystallised.
In this guide we set out an approach to assessing operational resilience. The approach leverages the methodology developed by Grant Thornton in the UK where similar requirements have been in place for some time.
Why is operational resilience important?
A robust and resilient financial services sector is essential to preventing financial harm. CPS 230, with its focus on operational resilience, is consistent with prudential requirements in the UK and Europe. It forms part of a suite of APRA requirements related to limiting financial harm due to disruption, including identifying domestic systemically important banks (D-SIBs), CPS 232 Business Continuity Management, CPS 190 Recovery and Exit Planning and multiple capital adequacy and liquidity requirements.
Operational resilience refers to the collective steps an entity takes to minimise the impact and disruption of operational risk incidents. Business continuity and business resilience aim to keep the entity as a whole operating. Operational resilience is related but differs in that the focus is not on the entity as a whole, but the key financial services it delivers.
Although APRA accepts that some degree of service disruption and outages will occur, it is important that regulated entities:
Have the resilience to get critical operations back up and running without causing financial harm;
Work within a pre-defined tolerance level that aligns with their broader risk appetite; and
Conduct robust scenario testing, using extreme but plausible scenarios, to assess whether it is possible to remain within the tolerances set.
The Board is expected to oversee and approve all aspects of operational resilience. As such, risk reporting and Board Risk Committee Charters may need to be updated to include information necessary to facilitate this. Operational resilience will also need to be reflected in risk management declarations.
Identifying critical operations
CPS 230 defines critical operations as processes that:
“If disrupted beyond tolerance levels would have a material adverse impact on its depositors, policyholders, beneficiaries or other customers or its role in the financial system.”
CPS 230 sets out the processes that it expects at a minimum to be identified as critical operations.
At its core, CPS 230 requires regulated entities to prioritise critical services over their own operational objectives to prevent financial harm to consumers. This means, for example, that in the event of a major disruption, APRA expects that priority will be given to restoring core banking operations over other revenue-generating non-regulated businesses.
Resilience planning
The following diagram sets out the steps necessary for effective resilience planning and key considerations:
For each critical process determine how much disruption could be tolerated and under what circumstances. This will require contingency and continuity planning, including identifying back-up or substitute systems, processes and service providers.
Map
Document the systems and workflows that support each critical process including activities undertaken by related and non-related service providers. Interdependencies between systems and processes must be identified so that the total impact of any disruption can be assessed.
Assess
Determine how the failure of a system, workflow or service provider would impact a critical process. Concentration of critical service providers may increase the impact. Contingency plans must address the disruption and identify potential substitutions.
Test
Use severe but plausible scenarios and past experience (for example, COVID) to test that the resilience of each critical process is within tolerance should a disruption occur. Generating scenarios will require involvement from IT, the business, risk and third-party service providers. Testing plans should consider the type and frequency of testing.
Invest
Where the resilience is below tolerance, the capacity to respond and recover from disruptions must be enhanced. The focus of enhancements should be to reduce the overall recovery time.
Communicate
Identify all internal and external stakeholders, what needs to be communicated, to whom and when. The overall objective of the communications is to enable customers to make informed decisions in the event of an outage.
The steps will need to be undertaken on a continuous basis to take account of emerging risks, the results of testing and any disruptions that may occur.
CPS 230 sets out the minimum tolerances that the Board must establish for each critical operation:
Maximum tolerable duration or volume of disruption;
Maximum extent of data loss; and
Minimum service levels the entity would maintain while operating under alternative arrangements during a disruption.
Additional metrics that could be established to measure disruption include:
Number of outages within a set timeframe; and
Number of customers affected by an outage.
The Financial Conduct Authority (FCA) in the UK published a list of root causes of disruptions to help inform operational resilience planning. These are useful when considering possible scenarios:
Failure of a change initiative – contributing factors identified were:
Ill-defined benefits and programme requirements
Poorly articulated delivered approach
Lack of acceptance due to poor engagement
Unrealistic delivery timetables
Focus on excessive functionality
Poorly managed changes in project scope
Ineffective risk management – lack of awareness of dependencies and constraints leading to issues, delays overspend or non-delivery
Multiple initiatives competing with each other and business as usual for the same resources
Third party failure
Change software/application issue
Cyber attack
Hardware issue
Human error
Process/control failure
Capacity management
Recovery from disruption will typically include the following steps:
Emergency response: rapid reaction to a disruption as soon as the issue is discovered
Crisis management: response and immediate steps taken to remediate the situation
Business continuity: implementation of existing plans to restore critical operations to an acceptable level within pre-agreed timeframes
Disaster recovery: restore IT capability and/or physical premises that supports operations
Restoring business as usual: must include lessons learnt to improve resilience in the long term
Holistic, joined up approach across all business units and functions
Understanding of key dependencies across all business units and functions
Clarity regarding the cost/benefit/risk trade-offs of designing critical processes that is consistent with risk appetite
Comprehensive, centralised and detailed catalogue of critical processes
Detailed mapping of processes, services, resources and their criticality
APRA has released draft Prudential Standard CPS 230 Operational Risk Management for comment. CPS 230 will replace CPS 231: Outsourcing and CPS 232: Business Continuity, and the sector specific standards HPS 231, SPS 231 and SPS 232.
What is operational resilience? Operational risk management analyses and defines risks associated with people, processes, and systems. Operational resilience defines the approach to managing operational risks.
Subscribe now to be kept up-to-date with timely and relevant insights, unique to the nature of your business, your areas of interest and the industry in which you operate.