Astronomy, as a pure science, is a distinctive endeavor relative to the utilitarian world of manufacturing. One might argue that satellites and near-Earth projects—such as keeping geopolitical adversaries at bay or finding dangerous rocks before they collide with our planet—are strategic. However, staring across the great beyond—the mostly empty universe—and seeking insights about its origin and destiny is a pursuit afforded only by our prosperity, as has been true throughout history.
Given a few billion dollars, those curious astronomers do employ thousands of engineers of numerous disciplines. After all, their search for answers needs rockets, robots, sensors, communications and automation to launch their instruments into space. In the realm of instrumentation, the space agency has invented or refined several technologies that have found their way into “practical” applications.
We’ve also benefited from key concepts around their design. So, when the head of science programs for NASA, Thomas Zurbuchen. a.k.a. “Dr. Z,” mentioned how many “single points of failure” are between the countdown and the beneficial functioning of the James Webb telescope—currently operating a million miles from Earth—I took note. There are 344. Clearly, every team was tasked with evaluating and identifying each component and activity where failure would result in a multibillion-dollar piece of space junk, if not a flaming descent back to Earth.
There was a day when controls professionals also looked at single points of failure since the widespread adoption of microprocessor-based control systems. The NASA folks had extraordinary motivation to address such risks; they brought concepts of fault tolerance and triple-modular redundancy (TMR) to bear on microprocessor reliability a half-century ago. Do our operations clients have the insight and fortitude to challenge us—instrumentation and controls professionals—to study and bring forward every single point of failure in our control system architecture? Shouldn’t we be frank about it?
In a continuous process plant, every control valve is a potential single point of failure for its associated control loop, where failure might stem from the actuator, positioner and/or the valve. Depending on the service, such a failure could mean an unplanned shutdown of the entire plant. Also included in potential failures is the communications infrastructure (Ethernet, fieldbus or 4-20 mA), instrument air or other servo-power, the electronics’ power supply, and whatever interface (I/O card) connects the valve to the controller. We’re up to seven single points of failure—potentially—for a single control valve. Your facility might have a 1,000, many of which could have a serious impact.
If you’re motivated, it’s not uncommon to address electronics’ power failures with power supply redundancy and an uninterruptible power supply (UPS). This keeps your two-wire valve positioners and instruments alive as long as the copper (twisted-pair) remains intact. It’s possible to use wire that forms a non-conductive char in a fire—further extending one’s vision and control in a fiery calamity. Instrument air and I/O cards can be made redundant by knocking off vulnerabilities that affect many loops. Such investments are not uncommon.
Users had to address a new single point failure when we began using Modbus, fieldbus, Ethernet and fiber-optics. Seemingly robust, redundant controllers that shared variables over Ethernet didn’t initialize in a friendly way following a switch over, even though Ethernet was also redundant. While other “layers” of communication were made more fault-tolerant, the “application layer” was not. Fiber can go great distances, but it fails when a fiber-to-copper converter fails or loses power. When operations find such single-point failures by trial and error, we’re fortunate if it only causes confusion or a hiccup, and not a total process shutdown.
The demanding nature of missions sending billions of dollars of instruments and equipment millions of miles into the solar system steels the awareness of its engineers to single-point failures. It’s the culture of successful missions to painstakingly seek them out. Astrophysicists may be pursuing pure science with a tenuous return-on-investment, but the challenges of the projects they’ve inspired fostered innovations and inspiration for our discipline. While our earthbound endeavors are more forgiving, we’re an asset to the enterprise when we bring them to light and allow our clients and stakeholders to weigh whether the cost of mitigating them is justified by the risk.