On Thursday, May 5, 2022, EnergyCentral held a podcast “Cyber Resiliency in the Power Industry” with representatives from EPRI, Dragos (Phil Tonkin -previously National Grid), PNM Resources (Spencer Wilcox), and Portland General Electric (PGE- Toley Clague). My concerns from the podcast were the inappropriate use of Failure Modes and Effects Analyses (FMEAs) for identifying cyber threat scenarios and “defining away” actual control system cyber incidents.
Inappropriate use of FMEAs for identifying cyber threat scenarios
Spencer stated that FMEAs are used in the nuclear industry and could be used to determine possible grid cyber threats. Toley mentioned they are also using the FMEA approach. While managing the EPRI Nuclear and Instrumentation and Diagnostic Program, I performed detailed FMEAs to understand the unexpected failure modes in nuclear safety pressure sensors. FMEAs (or Hazard Operations analyses in the petrochemical industry) are valuable when all potential systems interactions and impacts are addressed. These system-wide studies are very complex, time-consuming, require experts from various fields, and can take many months. For interconnected utilities, these types of analyses can be extremely challenging and may require multiple utilities/utility organizations to collaborate. As an example, an Interconnection-wide oscillation caused by a single failed sensor in a steam turbine in Florida directly resulted in load swings in New England. How would a single utility’s FMEA address this case? Moreover, when devices such as process sensors and actuators are out-of-scope for NERC CIPs and generally not included in the FEMAs, the FEMA approach can not provide a comprehensive analysis.
Identifying all cyber incidents
Is it only a cyber incident if it’s intentional, or can accidents be cyber incidents as well? Or, to put it another, way, it’s sometimes difficult to determine whether an incident has been deliberately induced, and even more difficult to determine if the intention of a deliberately induced event was malicious. As a result, I was disappointed to hear Phil Tonkin state that an incident is cyber only if you can establish malicious intent. An incident is based on actual consequential impact whether from an attack or unintentional scenario. A sophisticated attacker can make a cyberattack look like an equipment malfunction and there are minimal cyber forensics to identify the cyber aspects. Addressing only what appear to be attacks can lead to a significant undercounting of actual incidents and resultant loss of safety and reliability. Consequently, both NIST and the Government Accountability Office (GAO) have cyber incident definitions that do not require malicious intent. Moreover, ignoring impacts if you don’t identify intent contradicts the basis of consequence-based engineering.
The 2008 Florida outage exemplifies what is wrong with the NERC CIPs and Phil’s definition. The details are in my book, “Protecting Industrial Control Systems from Electronic Threats”. To summarize, an engineer was dispatched to Florida Power and Light's Flagler substation to diagnose a potential equipment failure. The engineer, who was authorized to make equipment changes and request SCADA support, removed all relay protection throughout the entire substation (problem 1). The engineer than requested the SCADA operator to remotely energize the suspect equipment without telling the SCADA operator all protection was removed (problem 2). The SCADA system did not identify that relay protection was not available (problem 3). Capacitor bank switches are considered to be distribution equipment so not in scope for NERC CIP. The operator used a serial communication link to energize the switch (NERC considers serial communications to be out-of-scope for NERC CIP). The capacitor bank switch had an electrical problem and, when energized, faulted to ground. With no relay protection, the failure propagated throughout the substation and from there to surrounding substations (problem 4). The cascading failures caused an outage affecting portions of the lower two-thirds of the state. Specifically, the event led to the loss of 22 transmission lines, 4,300 MW of generation (including nuclear units), and 3,650 MW of customer load. The only difference between this incident being malicious as opposed to unintentional was the motivation (intent) of the substation engineer. The SCADA operator had no malicious intent because the operator was unaware there was no relay protection. Yet, it is evident that a remotely-energized equipment failure is a cyber incident. It is also evident from this event that distribution equipment and serial connections can lead to region-wide power outages while being outside the scope of the NERC CIPs.
Lack of documented cyber incidents
Spencer and Toley stated there have been no documented control system cyber incidents in the electric industry. This is in contrast to DOE publicly documenting 39 cyber events including cyber events in New Mexico and Oregon (https://www.controlglobal.com/blogs/unfettered/control-system-cyber-incidents-in-electric-and-other-sectors-are-frequent-often-impactful-but-not-reported). DOE also identified 150 cases of “Complete loss of monitoring or control capability at its staffed Bulk Electric System control center for 30 continuous minutes or more”. These are control system cyber incidents. The 150 is a conservative number because incidents that are less than 30 minutes or not at a staffed bulk electric control center wouldn’t meet the disclosure threshold. Eleven of these DOE-identified cases affected at least 80MW of load and one affected more than 130,000 customers. There have been confirmed cases where China and Russia have cyberattacked US electric control systems. In another case not addressed by DOE because power was not lost, a utility had their SCADA system targeted by hackers resulting in a SCADA system shutdown for 2 weeks. In support of the Idaho National Laboratory, I worked with the affected utility to quantify the economic impact of the cyberattack (details in my book). Overall, I have documented more than 500 control system cyber incidents in the electric industry.
Recommendations
- All control system cyber incidents, whether “known” malicious or unintentional, should be identified.
- Control system cyber security training that includes addressing field devices and systems interactions should be provided. The general training should be given to IT and OT network personnel and in more detail to control system engineers. Unfortunately, this type of training is not readily available. Training should address the different types of control system cyber incidents that have occurred.
- The security program should include monitoring of control system field devices.
Summary
Control system cyber incidents are real and impactful (more than 500 incidents in the electric industry). To date, most of these incidents have not been identified as "cyber” because of lack of identified intent. When reporting and remediating a control system cyber incident, the intent isn’t as important as the impact of the incident - the basis of consequence-based engineering. Using techniques such as FMEAs can be valuable if all control system devices, networks, and scenarios are considered. However, the interconnectedness of utilities can require that FMEAs consider the impacts one utility can have on another. The discussions highlight the need for control system cyber security training that includes addressing field devices and systems interactions. Unfortunately, this type of training is not readily available. Moreover, the security program should include monitoring of control system field devices which are currently outside the scope of the NERC CIPs.
Joe Weiss