Monday, May 30, 2011

A Failure of Imagination?

One cause of the Fukushima disaster was a failure of imagination - - something any designer, engineer, or regulator is vulnerable to.  The actual earthquake was within the safety margin of the plant - - designed for a 8.2 magnitude earthquake with a 9.0 magnitude occurring.  But whereas the plant was built to survive tsunami waves of 18.7 feet, the waves that hit were 46 feet tall.  Waves of that height are not without precedent; an earthquake and tsunami of comparable size struck the area in A.D. 869.  When engineers make such "design-basis" errors, all bets are off.

In the coming century, "black swans" will seem to occur with surprising frequency.  There are several reasons for this.  We have chosen to engineer the planet.  We have built vast networks of technology.  We have created systems that, in general, work very well, but are still vulnerable to catastrophic failures.  It is harder for any one person, institution, or agency to perceive all the interconnected elements of the technological society.  Failures can cascade  There are unseen weak points in the network.  Small failures can have broad consequences.

Still, engineers can prepare only for events they can foresee.  Our design basis has been based on improbable possibilities.  But engineers are not so good at designing for a once-in-a-blue-moon event that hasn't happened.  Such uncertainties make it impossible to know if a margin of error or twice the design basis is sufficient.  The central question for engineering then becomes - - Is what you are willing to design for and does society understand that and accept that factor of safety?

A new paradigm at the organizational level is needed.  One type of organization is referred to as High Reliability Organizations (HRO) (see Managing the Unexpected: Resilient Performance in an Age of Uncertainty by Karl Weick and Kathleen Sutcliffe).  They practice a form of organizing that reduces the brutality of natural and man made disaster audits (engineering is an audit profession - - Mother Nature gives you periodic audits) and speeds up the process of recovering.  Five principles are the hallmark of HROs - -

HRO Principle 1 - - Preoccupation with failure.  They are preoccupied with failure - - they treat any lapse as a symptom that something may be wrong with the systems, something that could have severe consequences if several small errors happened to coincide.  HROs encourage reporting of errors, they elaborate experiences of a near miss for what can be learned, and they are wary of the potential liabilities of success, including complacency, the temptation to reduce margins of safety, and the drift into automatic processing.

HRO Principle 2 - - Reluctance to simplify.  HROs are reluctant to accept simplifications.  They understand that less simplification allows you to see more - - a more complete and nuanced picture  of what they face and who they are as they face it.  They want to see as much as possible.  They welcome diverse experience, skepticism toward recieved wisdom, and negotiating tactics that reconcile differences of opinion without destroying the nuances that diverse people detect.

HRO Principle 3 - - Sensitivity to operations.  They are attentive to the front line, where the real work gets done.  The "big picture" in HROs is less strategic and more situational than is true of most other organizations.  When people have well-developed situational awareness, they can make the continuous adjustments that prevent errors from accumulating and enlarging.  People on HROs know that you can't develop a big picture of operations if the symptoms of those operations are withheld.

HRO Principle 4 - - Commitment to resilience.  No system is perfect.  HROs know this as well as anyone.  This is why they complement their anticipatory activities of learning from failure, complicating their perceptions, and remaining sensitive to operations with a commitment to resilience.  HROs develop capabilities to detect, contain, and bounce back from those inevitable errors that are part of an indeterminate world.  The hallmark of an HROs is not that it is error-free, but that errors don't disable it.  Resilience is a combination of keeping errors small and of improving workarounds that allow the system to keep functioning.

HRO Principle 5 - - Deference to expertise.  Rigid hierarchies have their own special vulnerability to error.  Errors at higher levels tend to pick up and combine with errors at lower levels, thereby making the resulting problem bigger, harder to comprehend, and more prone to escalation.  To prevent this deadly scenario, HROs push decision making down and around.  Experience by itself is no guarantee of expertise, since all too often people  have the same experience over and over and do little to elaborate those repetitions.

Unexpected events can get you into trouble unless you create a mindful infrastructure that continually tracks small failures, resists oversimplification, is sensitive to operations, maintains capabilities for resilience, and monitors shifting locations of expertise.  In some form or fashion, the BP Deepwater disaster is a great example on an organization that managed to violate all five principles in a very short amount of time.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.