Don't let trust issues derail your operational resilience plan
With cyber-attacks against the control systems responsible for our critical infrastructure accelerating, the need for process manufacturers and utilities to understand their specific risks to ongoing operations—and take mitigatory interventions—has never been more urgent.
Adequate cybersecurity protections are, of course, part of the answer. But true operational resilience requires more than just perimeter defenses, according to Pete Diffley, longtime automation specialist responsible for the uptime of production assets for products ranging from clean water to contact lenses. Today, he’s leader of global partnerships for VTScada by Trihedral, developer of human machine interface (HMI) and supervisory control and data acquisition (SCADA) software. Control caught up with Diffley to better understand why access control is necessary but not sufficient, and the sort of mindset one must possess to holistically address operational risks.
Q: How have the cybersecurity threat vectors changed in the recent past?
A: In the past couple of years, we’ve seen a dramatic increase in the number of ransomware attacks on critical infrastructure systems, with research indicating that today as many as one in five of all ransomware attacks are against industrial organizations. Ransomware represents an escalation of the cyber-threat landscape in that it’s not just access to sensitive information that’s at risk. If an operational technology (OT) system is successfully hijacked, revenue-generating production or continuity of clean water or electricity can grind to a halt. Paying the ransom, of course, represents a further, potentially significant financial impact. Today it’s not just mischievous hackers at work, but large and very profitable criminal enterprises. And with clear governmental advisories now in place, there’s the very real possibility that individuals who fail to prioritize the necessary steps to head off a ransomware attack could face prison time.
Q: In response to these threats, what sorts of actions are government agencies recommending?
A: Just this summer, the Cybersecurity and Infrastructure Security Agency (CISA) and National Security Agency (NSA) issued a joint advisory that acknowledged the growing ransomware threat and reinforced many of the best practices already established. First among these is to immediately create an accurate, “as operated” OT network map, then take prudent steps to harden those networks. The third is to establish a resilience plan for OT systems to understand and evaluate the risks to OT assets, so if things go sideways, everyone knows what steps to take—all with an eye to mitigating these risks in order of priority and consequence. Fourth is the importance of exercising that incident response plan. And fifth is the implementation of a continuous event-monitoring system.
Q: External cyber threats don’t present the only risk to the continuity of industrial operations. What other sorts of activities should be part of a resilience plan?
A: The National Institute of Standards and Technology (NIST) defines operational resilience as “the ability of systems to resist, absorb and recover from or adapt to an adverse occurrence during operations that may cause harm, destruction or loss of ability to perform mission-related functions.” It’s a broad-ranging definition that might include the failure of a piece of non-redundant critical equipment to more insidious causes. A thorough resilience plan should strip bare the notion that the complex systems of systems that constitute industrial processing units—and the human operators that supervise them—can always be trusted to behave in a predictable, reliable manner. You’re planning how to react to the possible, not the probable.
Q: The term “trust” is used in the context of cybersecurity, but often only about whether a given device or individual is given access to a network. Should it go further than that?
A: I definitely believe it should. I happened to be talking with my teenage son the other day about how likely it was that a given physical symptom indicated a more severe, underlying condition. “Twenty percent,” he quickly responded, quoting a presumably trustworthy .org website he found via a quick Google search. I responded that not everything posted on the Internet (even non-profit sites) is factual, and some items that individuals posting them believe to be true can be influenced by underlying motivations such as business interests, money or simple expedience.
Take, for example, the rn4J vulnerabilities. Because Log4j is a Java-based library of open-source logging functionality that developers routinely embed within a larger piece of application software, its vulnerability brought new scrutiny to the chronic need to better manage software development supply chains. A piece of industrial application software may include several hundred third-party components—any one of which could go unlicensed or unsupported at any time. And, if you read the licensing agreement before checking the “I agree” box, you may be surprised to learn that the solution provider to whom you paid that not insignificant licensing fee assumes little or no responsibility—even if one of those third-party software components goes belly up to a malware attack.
The expedience that motivates using dozens or hundreds of pieces of code from other sources is one example of a mindset influenced, not by the desire to build the most resilient software possible, but rather the quickest, cheapest solution that will function effectively—at least for now.
In contrast, Trihedral has long resisted using third-party code in our VTScada software. We also use development processes that are designed to ensure the quality and security of our software releases. That means extensive testing to catch any issues before they go out to our users, including design reviews before coding begins, and coding reviews by more than the person who developed it. These processes were also key to the organization being quickly certified to the International Electrotechnical Commission’s 62443-4-1 cybersecurity standard for industrial automation and control systems earlier this year.
Q: How well are today’s OT systems prepared to defend themselves against threats to resilience that sometimes appear as wolves in trustworthy sheep’s clothing?
A: When it comes to ensuring resilience, it’s important to be able to examine the behavioral side of the OT environment, but not in an overly burdensome way—and that means devices and systems as well as human operators. We need guard rails that prevent otherwise trusted systems from getting out of line. Artificial intelligence can be used to monitor and alert if devices or systems are suddenly behaving or communicating strangely. From an operator perspective, just as we use different people for code checks than those who wrote the code, setpoint changes that are outside designated guard rails might require the input of multiple operators. It’s a sanity check to make sure that the human operators, like our systems, are at the top of their game when making decisions as well.
Meanwhile, one of the most important times to read is the end-user license agreements (EULA) of the industrial application software you're using to control your process. The devil is in the details. How dependent is it on third-party software components to function? Is it even capable of being used in a critical process in the first place, or has that also been “waived”?
Some software providers present a cool wall of logos of companies that use their software, in an effort to show that their software is used by these big brands—therefore, “of course it will work for your application.” Is that the case, though, or is it being used in just a non-critical area that can tolerate downtime?
A few years ago, as an end user, I took part in a major evaluation project, comparing a number of well-known industrial software packages. I had the opportunity to ask different end users if they would use their chosen software in a “critical environment, where their process depended on it”? Almost all said no—all, that is, except the users of VTScada, who all responded with “yes, absolutely.”
Leaders relevant to this article: