Government and industry organizations’ gaps in understanding control system cybersecurity
Incident response methods and practice can skew focus towards malicious intent, directed attack, threat actor attribution, often reducing categorization or recognition to an event that is cyber-related, but not malicious. But this is not always the case, nor should it be.
The Government Accounting Office (GAO) 21-477 defines a cyber incident as “an event that jeopardizes the cybersecurity of an information system or the information the system processes, stores, or transmits; or an event that violates security policies, procedures, or acceptable use policies, whether resulting from malicious activity or not” [emphasis added]. However, the National Transportation Safety Board (NTSB) has a history of not identifying control system incidents as being cyber-related.
Five catastrophic events demonstrate the deficiency in NTSB’s incident recognition and control system cyber expertise – the 1999 Olympic Pipeline gasoline pipeline rupture, the 2009 DC Metro train crash, the 2010 PG&E San Bruno natural gas pipeline rupture, the 2022 Union Pacific (UP) train crash, and the 2024 Dali containership crash. The Food and Drug Administration (FDA) has also been inconsistent in identifying cyber incidents in medical device incidents, even when these have injured people and caused emergency recalls.
National Transportation Safety Board (NTSB)
Olympic Pipeline gasoline pipeline rupture: As background, I held the first control system cybersecurity conference in July 2002. NTSB attended the conference as they were finalizing the report on the Olympic Pipeline rupture (88 pages). In the 2007 timeframe, under contract to NIST, Marshall Abrams from MITRE and I did a detailed analysis of the Olympic Pipeline incident from information provided by NTSB. The report is on the MITRE and NIST websites and in my book, Protecting Industrial Control Systems from Electronic Threats.
The NTSB concluded that if the SCADA system computers had remained responsive to the commands of the Olympic controllers, the controller operating the pipeline probably would have been able to initiate actions that would have prevented the pressure increase that ruptured the pipeline. In other words, the SCADA system’s unresponsiveness was determined to be the proximate cause of the rupture. The term “cyber” was not used in the 88-page document.
We found the Olympic Pipeline incident originated from a broadcast storm: too much data on the network that resulted in shutting the SCADA system. The SCADA scan rate went from 3-7 seconds to 30 seconds, to 400 seconds, to totally unresponsive before the pipe rupture. NTSB stated that the slowing and stopping of the sensor scan pattern was the only abnormality found that indicated that there was a problem with the SCADA computer control system. None of the host VAX computer logs or any of the error logs associated with the SCADA system captured any data that indicated that the system was having a problem completing its assigned tasks. However, we found the SCADA system had a history of problems. Moreover, the back-up system in Houston was not a clone, and so it was not possible to replicate the incident.
As a result of this incident, three people died, three people went to jail, and the Olympic Pipeline Company declared bankruptcy, This was a cyber incident though not identified as such.
San Bruno natural gas pipeline rupture: The Pacific Gas & Electric (PG&E) San Bruno natural gas pipeline rupture was due to over-pressurizing a weak pipe (a spool piece installed where PG&E ran out of seamless pipe during the initial installation). The pipe rupture event occurred when PG&E scheduled maintenance to replace the uninterruptible power supply (UPS) at the PG&E Milpitas, CA SCADA terminal. PG&E’s logic was that if the SCADA system were to be disabled, the control valves would open. This logic assumed the three pipelines served by the Milpitas terminal, including the line to San Bruno, were 400 psi stainless steel pipe. PG&E as a corporation was a convicted felon for not doing the maintenance to identify the weak spool piece in San Bruno. The event cost PG&E billions of dollars and their management team.
According to the NTSB final report, PG&E took 95 minutes to stop the flow of gas and to isolate the rupture site—a response time that was excessively long and contributed to the extent and severity of property damage and increased the life-threatening risks to the residents and emergency responders. PG&E’s SCADA system limitations caused delays in pinpointing the location of the break. Like the Olympic Pipeline rupture, the PG&E gas SCADA system had a history of problems.
I did a line-by-line comparison of the Olympic Pipeline rupture and the San Bruno pipeline rupture, and provided the information to NTSB as there was no mention of the term “cyber” in the 153-page NTSB report. Additionally, I had the lead NTSB investigator on San Bruno speak to the ICS Cybersecurity Conference on the San Bruno incident, but he did not mention the term “cyber” in his presentation.
Dali ship crash: In questioning before the House Transportation Committee, Rep. Brandon Williams of New York asked NTSB Chair Jennifer Homendy how deeply her team planned to delve into the breaker system and hinted at possible cybersecurity risks. Maritime cyber experts, the U.S. Coast Guard, and the FBI (none being control system cyber experts) assessed that the odds of a cyberattack aboard the Dali were low, and Homendy said that the investigation has seen no signs to suggest that there was any cyber intrusion.
In a pointed exchange, Williams questioned whether NTSB has the capability to do a thorough component-by-component analysis of the entire switchboard control system, down to the most basic building blocks of its code. He raised the possibility that a sophisticated threat actor could target small, miniature computer assemblies - embedded systems - that control simple functions within the electrical system.
"There's a lot of concern about embedded systems, embedded into what is called a real-time operating system or inside the control logic or the control elements," he said. "That would require an enormous amount of forensics to evaluate . . . is that kind of investigation underway?"
Homendy said that her investigative team has 400 years of collective experience and will follow the evidence wherever it leads, adding that a cybersecurity threat would be in the jurisdiction of the Department of Justice. She noted that NTSB is required to notify law enforcement if it uncovers a possible crime or a cyberattack. "We will follow the evidence and anything security-wise, if we find anything, we will turn it over [to the FBI] immediately," she said. This assumes NTSB was only looking for issues that appeared to be a cyberattack (which are not obvious) and ignoring unintentional cyber incidents such as the Olympic Pipeline and San Bruno pipeline ruptures and the DC Metro and Union Pacific train crashes. Moreover, the FBI does not have control system cyber expertise. As a result, the NTSB May 14, 2024, preliminary report on the Dali did not have participation from cybersecurity organizations.
Maritime control system cyber incidents were not limited to the Dali. In June 2024, a bridge in Charleston, South Carolina, was temporarily closed after the crew lost control of a large container ship after it left port. Shortly before the ship lost propulsion, it was at nearly full throttle which should not have happened. Again, there was no mention of the word “cyber.”
Union Pacific (UP) train crash: September 8, 2022, a conductor and engineer of a UP train were killed when the train collided with railcars stored in a siding in Imperial County, California. The train reversed direction into Bertram siding, a signal-controlled siding monitored from the Omaha-Nebraska control center. Upon entering the siding, with helper locomotives in the lead, the train traveled about 802 feet before colliding with a string of 74 empty intermodal railcars that had been stored in the siding since December 2021. The two lead locomotives and one intermodal railcar derailed, along with two of the empty stored intermodal railcars. UP estimated damage to track and equipment at about $1.2 million.
Throughout the Bertram siding track, the NTSB observed surface rust along the top of the rail head, web, and base of the rail. Surface rust, or corrosion, can cause oxidation of a metal surface, which leads to a decrease in electrical conductivity. When NTSB conducted a shunt sensitivity test on the Bertram siding track, the rust buildup prevented a 0.06-ohm shunt from connecting and prevented the track circuit from appearing occupied. The data logs further showed that since late August 2022, the track circuit had intermittently indicated it was unoccupied. This is known as an intermittent track occupancy indication. At the time of the accident, the CAD display indicated the track was unoccupied. The NTSB reviewed screenshots of the CAD display from the 24 hours before the accident. NTSB identified the intermittent track occupancy indication by comparing screenshots from the UP-control center in Omaha: the line color for Bertram siding was sometimes blue (unoccupied but blocked) and sometimes magenta (occupied and blocked). This condition occurred 56 times between August 29 and September 7, 2022, and 22 times in the 24 hours that preceded the accident.
As a result, NTSB determined that the probable cause of the collision was the routing of a UP train into a siding that was occupied by 74 empty intermodal railcars, and that this routing was made possible by the inappropriate removal of a computer-aided dispatching system block on the siding at the dispatch center. Contributing to the cause of the accident were the surface rust on the rails and wheels of the stored railcars that degraded the performance of the track circuit in Bertram siding and caused the computer-aided dispatching system to inaccurately indicate the siding was unoccupied.
The term “cyber” was not used in the 16-page report.
DC Metro Train crash: The loss of view of a train by a train control system is not unique. The 2009 DC Metro train control system lost view of the Red Line train that crashed into a parked train at the Fort Totten station killing nine. This event was not identified by NTSB as being cyber-related either (I identified the event as being cyber-related, resulting in the formation of the Transportation Research Board panel of which I was a participant). The NTSB report of the DC Metro train crash did not mention the term “cyber.”
Food and Drug Administration (FDA)
According to the FDA, as medical care increasingly takes advantage of integrated automated systems, the benefits come with risks of a kind more often associated with industrial processes. The FDA has established medical device cybersecurity requirements in “Cybersecurity in Medical Devices: Quality System Considerations and Content of Premarket Submissions Guidance for Industry and Food and Drug Administration Staff” issued on September 27, 2023. Section 524B(c) of the FD&C Act defines "cyber device" as a device that (1) includes software validated, installed, or authorized by the sponsor as a device or in a device, (2) has the ability to connect to the internet, and (3) contains any such technological characteristics validated, installed, or authorized by the sponsor that could be vulnerable to the cybersecurity threats. The requirements include Interoperability Considerations stating that: “Cybersecurity Controls should be used as a means to allow for the safe and effective exchange and use of information.” The unsafe conditions identified more than ten medical device recalls that injured hundreds were due to control system cyber incidents but were not identified as cyber incidents.
Department of Energy (DOE)
DOE’s failure to recognize and disclose cyber-related control system incidents, documented in this blog, needs no elaboration here.
Industry
“Curricular Guidance: Industrial Cybersecurity Knowledge” is the culmination of a years-long collaborative effort among the Idaho National Laboratory, DOE’s Office of Cybersecurity, Energy Security, and Emergency Response (DOE CESER), Idaho State University, and the International Society of Automation Global Cybersecurity Alliance (ISAGCA) under the supervision of Dr. Sean McBride. The book includes a section on Industrial Cybersecurity Events and Incidents. In February 2021, the Oldsmar, FL water facility experienced a “lye concentration” incident. The guidance document stated it was” an unknown adversary who accessed the city’s poorly protected water provisioning system over the TeamViewer application and increased the set point for lye (used to sanitize water for public consumption) to dangerous levels.” However, this wasn’t a cyberattack but user error. Unfortunately, CISA, EPA, this document, and others have not corrected this error.
The guidance document stated the DC Metro train crash was due to an incorrectly installed trackside sensor. This was incorrect as the incident was caused by parasitic oscillations that blinded the train control system. As documented in my book, “Protecting Industrial Control Systems from Electronic Threats” published in 2010, parasitic oscillation issues occurred at the Boston MTA and again at DC Metro after the 2009 Red Line train crash. As noted, the 2022 UP train crash had similar issues with the DC Metro Red Line train crash with physics issues “blinding” the train control system leading to the deadly crashes.
Summary
Cybersecurity programs assume organizations can recognize control system incidents as being cyber-related. Yet government organizations including NTSB, FDA, FBI, TSA, EPA, CISA, and DOE have not identified control system incidents as being cyber-related. The five cases discussed were fatal catastrophes. In all cases, NTSB identified control systems as the proximate cause of the incidents. Yet, none of the cases used the term “cyber”. Marshall Abrams and I were told by NTSB that the Olympic Pipeline case was the most complex case they had worked on because of the control system issues. Apparently, almost twenty years later, that hasn’t changed. These cases were unintentional. Consider the impact of not identifying a malicious control system cyberattack that kills people and damages equipment as being cyber-related. It’s a question of awareness—it’s difficult to deal with a risk if you’re not equipped to recognize it.