Insight

Critical operations: APRA’s CPS 230 key areas of focus

By:
Isabella Quant
insight featured image

CPS 230 requires regulated entities to consider service disruption from a different perspective. Working backwards through a scenario, entities must identify the harm that a disruption may cause to its customers or the broader financial system, then take active measures to prevent it (operational risk) and recover from it (operational resilience). 

Welcome to the third in our series of CPS 230 technical guides.

In the discussion paper that accompanied the issue of draft CPS 230, APRA noted that one of its key objectives is to focus the Board on the importance of operational resilience through requiring the setting of tolerance levels for disruptions to critical operations. Although the approach to set the tolerances should leverage that for setting risk appetite tolerances, the fundamental difference is that when considering operational resilience, the risk is crystallised.

In this guide we set out an approach to assessing operational resilience. The approach leverages the methodology developed by Grant Thornton in the UK where similar requirements have been in place for some time.

Why is operational resilience important?

A robust and resilient financial services sector is essential to preventing financial harm. CPS 230, with its focus on operational resilience, is consistent with prudential requirements in the UK and Europe. It forms part of a suite of APRA requirements related to limiting financial harm due to disruption, including identifying domestic systemically important banks (D-SIBs), CPS 232 Business Continuity Management, CPS 190 Recovery and Exit Planning and multiple capital adequacy and liquidity requirements.

Operational resilience refers to the collective steps an entity takes to minimise the impact and disruption of operational risk incidents. Business continuity and business resilience aim to keep the entity as a whole operating. Operational resilience is related but differs in that the focus is not on the entity as a whole, but the key financial services it delivers.

Although APRA accepts that some degree of service disruption and outages will occur, it is important that regulated entities:

  • Have the resilience to get critical operations back up and running without causing financial harm;
  • Work within a pre-defined tolerance level that aligns with their broader risk appetite; and
  • Conduct robust scenario testing, using extreme but plausible scenarios, to assess whether it is possible to remain within the tolerances set.

The Board is expected to oversee and approve all aspects of operational resilience. As such, risk reporting and Board Risk Committee Charters may need to be updated to include information necessary to facilitate this. Operational resilience will also need to be reflected in risk management declarations.

Identifying critical operations

CPS 230 defines critical operations as processes that:

“If disrupted beyond tolerance levels would have a material adverse impact on its depositors, policyholders, beneficiaries or other customers or its role in the financial system.”

CPS 230 sets out the processes that it expects at a minimum to be identified as critical operations.

At its core, CPS 230 requires regulated entities to prioritise critical services over their own operational objectives to prevent financial harm to consumers. This means, for example, that in the event of a major disruption, APRA expects that priority will be given to restoring core banking operations over other revenue-generating non-regulated businesses.

Resilience planning

The following diagram sets out the steps necessary for effective resilience planning and key considerations:

Click to enlarge

The necessary steps can be summarised as:

Activity

Detail

Identify

For each critical process determine how much disruption could be tolerated and under what circumstances. This will require contingency and continuity planning, including identifying back-up or substitute systems, processes and service providers. 

Map

Document the systems and workflows that support each critical process including activities undertaken by related and non-related service providers. Interdependencies between systems and processes must be identified so that the total impact of any disruption can be assessed.

Assess

Determine how the failure of a system, workflow or service provider would impact a critical process. Concentration of critical service providers may increase the impact. Contingency plans must address the disruption and identify potential substitutions.

Test

Use severe but plausible scenarios and past experience (for example, COVID) to test that the resilience of each critical process is within tolerance should a disruption occur. Generating scenarios will require involvement from IT, the business, risk and third-party service providers. Testing plans should consider the type and frequency of testing.

Invest

Where the resilience is below tolerance, the capacity to respond and recover from disruptions must be enhanced. The focus of enhancements should be to reduce the overall recovery time.

Communicate

Identify all internal and external stakeholders, what needs to be communicated, to whom and when. The overall objective of the communications is to enable customers to make informed decisions in the event of an outage.

The steps will need to be undertaken on a continuous basis to take account of emerging risks, the results of testing and any disruptions that may occur.

Setting tolerances

CPS 230 sets out the minimum tolerances that the Board must establish for each critical operation:

  • Maximum tolerable duration or volume of disruption;
  • Maximum extent of data loss; and
  • Minimum service levels the entity would maintain while operating under alternative arrangements during a disruption.

Additional metrics that could be established to measure disruption include:

  • Number of outages within a set timeframe; and
  • Number of customers affected by an outage. 
Severe but plausible scenarios

The Financial Conduct Authority (FCA) in the UK published a list of root causes of disruptions to help inform operational resilience planning. These are useful when considering possible scenarios:

  • Failure of a change initiative – contributing factors identified were:
    • Ill-defined benefits and programme requirements
    • Poorly articulated delivered approach
    • Lack of acceptance due to poor engagement
    • Unrealistic delivery timetables
    • Focus on excessive functionality
    • Poorly managed changes in project scope
    • Ineffective risk management – lack of awareness of dependencies and constraints leading to issues, delays overspend or non-delivery
    • Multiple initiatives competing with each other and business as usual for the same resources
  • Third party failure
  • Change software/application issue
  • Cyber attack
  • Hardware issue
  • Human error
  • Process/control failure
  • Capacity management
Recovery plans

Recovery from disruption will typically include the following steps:

  1. Emergency response: rapid reaction to a disruption as soon as the issue is discovered
  2. Crisis management: response and immediate steps taken to remediate the situation
  3. Business continuity: implementation of existing plans to restore critical operations to an acceptable level within pre-agreed timeframes
  4. Disaster recovery: restore IT capability and/or physical premises that supports operations
  5. Restoring business as usual: must include lessons learnt to improve resilience in the long term
Characteristics of an effective approach to operational resilience
  • Holistic, joined up approach across all business units and functions
  • Understanding of key dependencies across all business units and functions
  • Clarity regarding the cost/benefit/risk trade-offs of designing critical processes that is consistent with risk appetite
  • Comprehensive, centralised and detailed catalogue of critical processes
  • Detailed mapping of processes, services, resources and their criticality