A year and a half ago, I wrote about disasters—and about how hard it can be to see them coming. I made the point that when we analyze a disaster retrospectively, we are likely to be led astray because we know how it's all going to turn out. Because of Hindsight Bias, in particular, we think it should have been obvious to everyone that a disaster was imminent, when in reality it might not have been clear at all. It is important to remember this bias when we try to understand a disaster, so we can look at events with the eyes of those who participated in them, to derive working lessons for the future.
But not all disasters are like this. Sometimes the risk of a disaster really is obvious to the people involved at the time, according to data they already have—they see the data, they understand the risk, and then somebody decides just to go ahead and do it anyway.
The Challenger disaster
"On January 28, 1986, Space Shuttle Challenger broke apart 73 seconds into its flight, killing all seven crew members aboard. The spacecraft disintegrated 46,000 feet (14 km) above the Atlantic Ocean, off the coast of Cape Canaveral, Florida, at 16:39:13 UTC (11:39:13 a.m. EST, local time at the launch site). It was the first fatal accident involving an American spacecraft while in flight."*
What happened? It was a cold day, and the rubber O-rings which sealed a joint in the right Space Shuttle Solid Rocket Booster were stiff. So they didn't seal the joint adequately. Shortly after liftoff, gases from within the rocket booster leaked out and started to burn through the larger structure. Now, the space shuttle and all its launch framework were made out of steel—of course. But the rocket boosters burned at 5600℉; and at that temperature, steel boils!** (Not melts—boils.) So naturally the whole assembly burst apart.
But how much of this could we have predicted ahead of time? It turns out the answer is, All of it. Recently I ran across a lecture on YouTube that breaks it down.*** (This lecture is saved in four parts, of which the first two discuss the Challenger disaster from a Quality perspective. The other two parts give valuable advice for managing your career and your life, but I won't focus on them here. You can scroll to the bottom of this post to find links to the lecture itself.)
In summary, the speaker (Mike Mullane) explains the sequence of events.
- During the initial design reviews, the O-rings were designated at "Criticality 1," meaning that a failure could entail the destruction of the vehicle and the loss of life. "Criticality 1" also meant that any damage to the O-rings constituted adequate cause to abort the mission and redesign the shuttle.
- Sure enough, after the shuttle's second flight (years before Challenger), the team recovered the parts and detected damage on the O-rings.
- But for this and that reason the team decided to go ahead with a third launch, and the third flight was fine.
- In future flights, sometimes the O-rings were damaged and sometimes they weren't.
- After-action reports regularly called out the risk posed by damage to the O-rings. Multiple memos, over a period of two years or more, described the O-ring issue as "urgent."
- But each flight was successful. So the project got the idea that the O-ring problem wasn't that big a deal. Every time the issue was raised, it was granted a standing waiver.
- Until, of course, one day it was a big deal after all ....
What is the normalization of deviance?
Mullane explains that the "normalization of deviance" stems from nothing more than the natural human tendency to take shortcuts under pressure. We know what the "right" way to do a job is, and when we are relaxed we are happy to follow it. But then time runs short, or money runs short, or something else happens—it could be anything, really—and we get under pressure. So we take a shortcut, to make the job easier.
And most of the time, after we take that shortcut ... nothing happens! The job gets finished with no problem. So the next time we are under pressure, we remember that shortcut and do it again. And then again. Pretty soon, the "shortcut" has become the normal way of working. The "deviance" (a deviation from the defined and approved method) has become "normalized."
We've seen this before. Last year, when I was writing about Boeing, I explained how their cost-cutting drive led them to gut what used to be a robust safety management system. One of the factors at work was exactly this dynamic. I wrote:
They [Boeing management] found, empirically, that they could eliminate one Quality inspection, save a few dollars, and no planes fell out of the sky. OK, good. How about eliminating two inspections? Three? Four? Where do we stop? You can see how, in the absence of visible negative feedback (like an increased accident rate), this could get out of hand quickly.
That's what happened with Challenger. Word for word.
How do you protect against it?
Fine, how do we avoid this?
The short answer is almost too simple: Don't do that! But that sounds obvious, and yet this dynamic continues to afflict people every single day. So really, what do we do?
Mullane lists four points that he thinks are critical:
- Recognize your vulnerability: Everybody thinks, It won't happen to me. I know all about this problem, so that makes me immune. I watched a video on YouTube. I read a blog post in Pragmatic Quality. I know better than to fall into this trap. Nice try. But the other people, those ones who did fall into this trap? They were plenty smart too. All of them "knew better." But when they felt pressured, their brains reacted automatically. It can happen to you too, exactly the same way. So watch for it.
- Execute to meet standards: This is the core of it. Plan the work, and then work the plan. Mullane explains the Air Force has a saying, "The flight manual is written in blood." In other words, every instruction in the flight manual was put there because one day somebody did something different and it turned out badly. Don't let the next one be you. If the manual says, "Abort the mission when the red light flashes," and then the red light flashes, ... abort the mission. Simple as that.
- Trust your instincts: Mullane makes a big point of saying that we often know more than we understand consciously, and that our instincts are there to keep us alive. So if something just feels ... off, somehow ... wrong, but you can't put your finger on quite why ... trust that feeling. Probably the thing really is wrong, and at some level you even know why. It just hasn't percolated up into your consciousness yet, but it will.
- Archive and review near-misses and disasters: Learn from other people's experience, so you don't have to go through the same thing. Look at the disasters—or the near-misses, where things came out fine but almost didn't—that your own team has experienced. But then try to find out about other teams as well. Look for the big disasters (or near-misses) in your industry, the ones that make the news. Read everything you can, and then flow down to your team what you have learned.
And then, if we do those four things, are we home free?
I'm pretty sure nobody can promise that. But if you do these things you'll be miles ahead. And you will have reduced the odds of normalizing deviance as far as you can.
If you want more details, Mullane's lecture is a good one.
Mike Mullane's lecture, part 1/4: What is normalization of deviance?
Mike Mullane's lecture, part 2/4: How do you protect against normalization of deviance?
Mike Mullane's lecture, part 3/4: Responsibility: https://www.youtube.com/watch?v=Wuk_DoX-rz8
Mike Mullane's lecture, part 4/4: Courageous self-leadership: https://www.youtube.com/watch?v=DABsxJtNcYg
__________
* Quoted from Wikipedia, "Space Shuttle Challenger disaster." I have used this article for basic information about the disaster.
** For specifics see this flyer from Northrop-Grumman on the Five-Segment Booster, especially the "Booster facts" on page 1.
*** The lecture was posted to YouTube about ten years ago, but I don't know when it was given. The speaker is Mike Mullane (website, Wikipedia), an engineer, weapon systems officer, retired USAF officer, and former astronaut. He was talking to the International Association of Firefighters (IAFF) about the "Normalization of deviance."