Closing the loop on layer of protection analysis alarms
The refinery’s fluidized catalytic cracker was engaged in a startup and the schedule was tight. While it was a net steam producer, as waste heat from reacting and regenerating catalyst was recovered in downstream units, it became a consumer during startup to keep turbines online until the process was underway. But steam availability was going to be impacted in a few hours when a neighboring unit undertook a “drum swing” to steam the last volatile hydrocarbons from a full coke drum. The heat was on, so to speak, for the novice operator to get the reactor up to the desired temperature for the introduction of feedstock. An uninterrupted heat-up with the seldom used startup heater was essential.
What wasn’t obvious to the less experienced operator was the precise nature of the heater’s burner management system (BMS) and the measurements that participated in a trip. Like most BMSs, fuel gas pressure at the burner provided a safeguard against flame instability and a high-pressure trip was configured—but in this case was a differential pressure, since the heater connected directly to the air flowing into the fluidized catalyst bed. Feeling the pressure to get the process underway before becoming starved for steam, the board man was pushing the heater hard, until it suddenly tripped. The question was, shouldn’t there have been a pre-alarm?
A brief investigation revealed there was a pre-alarm configured, but for some reason it had been suppressed for months. The normal suppression timeout of two shifts wasn't in force for this measurement; it wasn’t meaningful during months or years of normal production, and automation that might have normally ensured it was back in force wasn’t applied to the suppression parameter. In this case, a startup was delayed, which may have meant many thousands of dollars in lost opportunity. What about alarms that are even more impactful?
That an instrument accurately responds to a changing process condition (e.g., differential pressure) and invokes an alarm at a meaningful setpoint requires not only that the device is properly and routinely calibrated. How often do we coast through routine calibrations without specifically noting that any and all configured alarms are invoked? When an alarm is claimed as a safeguard in a layer of protection analysis (LOPA), the risk mitigation depends on a minimum of three (potentially unreliable) aspects: first, that the measurement is good (accurate, stable, timely, etc.). Second, that the alarm is enabled and annunciates at the proper setpoint with the assigned priority (which also confirms that bus and/or network subsystems are healthy). And third, that the operator can respond to the alarm (i.e. isn’t distracted or detained, and doesn’t dismiss it because it rings in all the time), and undertakes actions that will mitigate the risk. Like a three-legged stool, if any leg is wobbly or broken, you’ve got nowhere safe to park your assets.
“LOPA listed” alarms have emerged in greater numbers as risk tolerance has decreased along with historic credits for innate mitigation (e.g., a safety relief valve or rupture disk). A once-in-10-year consequence that was tolerable a decade ago—say, a financial hit of $100,000—is now something requiring action, and the old mitigations aren’t enough to cover it. Controls, alarms and interlocks that previously warranted no special management have been called upon to bridge some of the gaps. So now, the operate & maintain forces face an onslaught of new testing and validation, while corporate leaders will tolerate little that might negatively impact the bottom line (i.e., adding people). The business is showered with the promises of AI, big data, digitalization, etc., but no one mentions the tedious slog required to prove the automation system will keep the scary process within its bounds.
The magic of AI or a well-tuned digital twin might have been clever enough to forewarn the novice operator that he was careening too close to the edge, where the process has never been. But data for startups and shutdowns is meager for a well-functioning continuous plant, and often the troubles of one startup aren’t repeated (or remembered) in the next. Similar to alarm rationalization, validation and enforcement of LOPA-listed alarm limits for various operating states has become yet another laborious but necessary undertaking.