Thursday, December 30, 2021

Finding root causes, Part 1: 5-Whys

Last week I talked about what a "real root cause" actually is, but I didn't say much about how to find them. Maybe a couple more words would be helpful. 

There are several tools you can use to dig out a root cause from under a big pile of symptoms. The simplest one is called a "Five-why analysis," and you can think of it as "problem-solving by a bright, persistent six-year-old." 

It all starts when something goes wrong. Somebody asks "Why?" and you give an answer.

"Yes, but why did that happen?"

Another answer.

"Yes, but why did that happen?"

A third answer.

"But Daddy, why did that happen??"

And so on. Just remember — six-year-olds never, ever get tired of this game.

The system is called "5-Why" but there is no law that you have to repeat the question "Why?" exactly five times. Maybe you can do it with fewer repetitions; sometimes it takes a lot more. But you keep at it until you get to a cause that is fundamental and actionable.

Here's an example.

  • Problem: My car won't start.
  • Why won't it start? The battery is dead.
  • Why is the battery dead? The alternator isn't working.
  • Why isn't the alternator working? The alternator belt is broken.
  • Why is the alternator belt broken? It wore out and was never replaced.
  • Why was it never replaced? I didn't maintain the car according to the schedule in the manual.
  • So the root cause why my car won't start is that I didn't maintain it properly.

Notice a few things about this example. 

FIRST: The most basic point is that the root cause really is a cause. It is a cause in the narrow sense that you can toggle it like a light switch and see the problem disappear or reappear. If I maintain my car regularly, this kind of problem will never happen. If I don't, it's bound to.

SECOND: Each "Why?" is based exactly, word-for-word, on the answer to the previous question. This is important to keep you from jumping around -- to make sure that the analysis has no logical breaks in it.

THIRD: Related to this point is another one, that you have to be able to read the answers backwards, linking them with therefore. If you can't, you've made a mistake in your analysis somewhere. In this example, it works:

  • I didn't maintain the car according to the schedule in the manual.
  • Therefore the alternator belt wasn't replaced when it wore out.
  • Therefore the alternator belt broke.
  • Therefore the alternator didn't work.
  • Therefore the battery died.
  • Therefore my car wouldn't start.

Does that make logical sense? Yes it does. But now consider this example:

  • Problem: I was late to work.
  • Why? There was a lot of traffic.
  • Why? I took a different route than usual.
  • Why? It was raining.

If you are not used to the 5-Why method, it can be easy to start down an analytical path like this one because this is how explanations burble up when you ask people what went wrong. And superficially it doesn't sound crazy. But let's rewrite it backwards:

  • It was raining.
  • Therefore I took a different route than usual.
  • Therefore there was a lot of traffic.
  • Therefore I was late to work.

Does that make logical sense? Maybe it makes a kind of sense, but right away you can see some gaps.

  • "It was raining, therefore I took a different route" is missing some explanation of what was wrong with my normal route. Was it closed? Flooded out? 
  • "I took a different route, therefore there was a lot of traffic" is weak too. Did I cause the extra traffic by taking a different route? No, of course not. Maybe I'm trying to say that I didn't know how much traffic to expect on that route because I don't usually take it, but that's not what I actually say. And would there normally have been so much extra traffic on that alternate route, or was the traffic jam caused by something else — like the rain — which makes my choice of a different route irrelevant? 
  • Of course these are little quibbles, and in this example they probably don't matter. But in a real-life example, it matters a lot which causes are relevant because those are the ones you will spend time on.
So no, as it stands this line of investigation has some gaps in it. And notice that I just said "Why ... why ...?" instead of repeating the previous answer each time. If I had done that, probably I would have seen the gaps earlier.

FOURTH: Sometimes there is more than one answer to a single question. In my example about the car not starting, the fourth "Why?" has two answers: (1) the alternator belt wore out, and (2) the alternator belt was never replaced. But in the next step, I explore only one of them. Why not the other?

In this case it wasn't worth exploring answer (1) in its own right: the answer to "Why did the alternator belt wear out?" is that everything wears out sooner or later. We all know that and it doesn't help us. It's not actionable, because we can't do anything to prevent it. 

So the analysis focused on answer (2), that the alternator belt hadn't been replaced. But sometimes it won't be so obvious which branch is important. In that case, list all the causes as different branches and follow each branch individually. Some of them will trickle out into truisms like "Everything wears out," and then you learn that those branches aren't useful. But sometimes you are surprised by which branches turn out to be relevant.

FIFTH: As I just repeated, a root cause has to be actionable. It has to be something you can correct. I can do something about maintaining my car on schedule; but I can't do anything about the overall tendency of things to wear out with time. For another example, see my discussion of wildfires last week.


So that's how you find a root cause, at the most basic level. Next week I'll talk about two ways you can expand the investigation, to make it broader and more comprehensive.

         

No comments:

Post a Comment

Quality and the weather

“ Everybody complains about the weather, but nobody does anything about it. ” The weather touches everybody. But most people, most of the ti...