Pragmatic Quality Blog: Problem-solving: Anatomy of an 8D

Over the last three weeks, I've talked about ways to improve your root cause analyses. But a root cause analysis is just one step — albeit the most important step — in the whole process of problem-solving. Now that we've discussed it in some detail, let's back up to look at the whole cycle from beginning to end.

There are several different tools or methods to formalize solving a problem. The one I'm going to describe is called an 8D. The name is an abbreviation of "eight disciplines," although "eight steps" might have been just as good a name. It does, in fact, unfold in eight steps, and it is a systematic way to cover all the bases when you want to make sure that a problem is completely solved.

When do you use an 8D?

Because an 8D is thorough and systematic, it can also be time-consuming. So you probably won't want to use it for every single problem that comes along. But pick and choose the problems where you really need a thorough solution — where it is really important that you guarantee it will never happen again.

Any problem with legal implications has to get a thorough solution, because you don't want to make a habit of running legal risks.
Any problem where you fundamentally aren't meeting your stated commitments — where you are failing to do what your organization is there to do — is another one that needs a thorough solution: for example, if you sell a product that plain doesn't work.

Those two categories should pretty much require an 8D (or the equivalent). After those, think about where it will be useful.

One good choice is repeated or systemic problems, things that always go wrong at the same time and in the same way. Even if one of these problems isn't bad enough to fit into the first two categories, if you can fix it once for all you will easily save enough time over the long haul to make it worth the time you invested up front.
On the other hand, an 8D is not a useful tool to untangle problems in a complex business process. 8Ds work best when there is already a clear definition of the target state — how things ought to be — so you can easily define the ways that reality doesn't match.

There might be other useful criteria too. Think about what is going to help you.

The steps in an 8D

Here are the eight steps. The first three can be done in any order, depending on the situation. In an emergency it can be more important to get D3 in place before you even think about D1 and D2.

D1: Name a team

Who's going to work on the problem? You need more than one person. Nobody knows everything, and everyone has blind spots. Besides, if one person knew enough to solve the problem by himself, he would have done it already and it would never have happened.

Name someone as the Team Lead. That's who will call the meetings, organize the investigation, and keep everyone on track.

Depending on your organization, you might also need to name a Sponsor. This happens especially when everybody has a lot of other things to work on, so team members risk being pulled away to do something else instead. A Sponsor is someone who can insist, "No, this problem has to be fixed and I want a report on my desk by Friday that tells me how far you've gotten." Ideally the Sponsor is high enough in the organization (or at least has enough authority) to make the 8D a real priority; but is also close enough to the working level that he feels the pain caused by the problem, and therefore really cares about getting it fixed.

D2: State the problem

OK, this sounds obvious, but write it down. As you study the problem, it is easy to get distracted by symptoms or side topics. Writing down what problem you are really trying to solve helps keep you on track. This is also the time to collect all the information you can find about the problem: what went wrong, where and when and how did it happen, who was involved, and so forth. Be as exact as you can.

Now, it's not unusual that you learn more about the problem as you get deeper into it. You might find that your original statement was too superficial, or that it only captured a symptom. In that case you can absolutely go back to update your problem statement based on your new and deeper understanding. Just remember that, when you are done, the problem you have identified in that statement is the problem you are going to have to fix so it never happens again. And all along the way, there should be a logical connection between the stated problem and the investigation you are doing.

D3: Contain the problem

Solving your problem will probably take a while, and you don't want it to get worse in the meantime. So before you get deep into the solution, take some kind of steps to contain it. If your widgets are coming off the line crooked, maybe you need to shut down the line until you find the problem; if only half of them are coming out crooked, maybe you put someone there to do a manual inspection of every single widget to filter out the bad ones. If there is a catastrophic bug in software that you distribute over the Internet, you probably want to pull it off your download server. Whatever it is, do something to block the problem so it won't get worse while you are figuring it out. Notice that your containment action doesn't have to be efficient or sustainable, because it's only temporary. But it has to be effective.

D4: Find the root cause

We've already discussed this topic at length: here, here, and here.

D5: Brainstorm possible corrective actions

Your root cause analysis may have uncovered several different root causes. (If you did a 2 x 2 x 5-Why analysis, for example, you should have at least four, and maybe more.) Now try to think of at least one permanent corrective action for each one of them: something you can do which will guarantee that that cause never recurs. I say "at least one," but it's fine to come up with more. Go ahead and list them all. Try to write each one so that it is obvious how it is relevant — how will this action prevent that cause from ever coming back again?

D6: Implement corrective actions

This step has several parts to it.

Now you have a list of possible corrective actions. Depending what you've got, it might not be practical to do them all. So evaluate them, one by one. Maybe this one is too expensive to be practical, or that one causes more problems than it solves. This is where you figure that out. Then, when you have evaluated all your possible actions, pick at least one corrective action to implement. I'd like to say "at least one corrective action for each root cause," but sometimes that's not practical. Figure out what you can pragmatically achieve. But also check the logic trees you worked out in D4 to make sure that the actions you've chosen really will be enough to prevent the problem from ever coming back.
Go implement the action (or actions) that you chose.
Follow up by checking your results. After you implemented your corrective actions, did the problem really go away?

If yes, that's great. Skip down to part 4.
If no, then there's a mistake somewhere in the analysis: either one of your root causes was wrong, or you missed a possible cause, or one of your corrective actions didn't completely eliminate the cause that it was supposed to resolve.

In that case, go back through your analysis until you find the error, and re-enter the process at that point: D4 for a new root cause, D5 for a new batch of possible corrective actions, D6 to pick one (or more).
Implement the new corrective action(s), and — once again — check to see if the problem disappears.
And so on. Do this until you finally make the problem disappear.

Once the problem has been permanently eliminated, there's one more part to D6. At this point you can finally afford to remove the temporary containment measure you put in place back in D3. Since there is no longer any possible chance of the problem recurring, there's no longer anything to contain.

At this point, your corrective actions are finally done.

D7: Assess risks and learn lessons

The corrective actions are done, but you're not.

As a result of going through this detailed analysis, you've learned something you didn't know before. You have learned that it is possible for such-and-such a problem to occur any time that you have this or that initial condition. You didn't know that before, and now you do.

So now ask yourself, "Where else do we have the same conditions?" In other words, "Where else are we at risk of the same problem happening, even though so far we've been lucky and it hasn't happened there yet?" This could mean almost anything, depending on what problem you just solved.

If the problem was that your widgets were coming off the line crooked, and if one of the root causes was that the widget-making machine didn't get the preventive maintenance it needed, you might ask yourself "What other machines do we use, and are all of them already getting preventive maintenance? Or do we have another under-maintained machine somewhere in the plant that might go bad tomorrow?"
If the problem was that you ran a plating bath at the wrong temperature despite clear instructions in the Control Plan, and if one of the root causes was that the line operator knew better than the author of the Control Plan, you might ask yourself "Is that the only bad Control Plan in the plant? Or are there others that were done just as sloppily? How many of them do we have to fix?"
Maybe you sell your widgets internationally, and you need to get them certified before they can be legally imported into Grand Fenwick. Also these certifications expire every two years and have to be renewed. Maybe the problem which triggered your 8D was that someone sold a big shipment of widgets to Grand Fenwick a week after the certification expired, and maybe one of the root causes was that nobody ever told the Order Desk that there was any kind of legal restriction on the orders they are allowed to take. During D6 you will certainly get all the problems related to Grand Fenwick certification sorted out, but during D7 you'll want to ask, "Are there any other legal restrictions on where we can sell any of our other products?" And it would be good to know the answer.

And so on.

Then, once you have answered the question where else this problem might be expected to show up, do something about it. Take steps to prevent it in those other spots before it has a chance to happen. Of course you should keep your level of effort proportional to the importance of the problem. But preventing a problem before it happens usually saves everyone a lot of time and money.

Finally, if you keep risk lists for any of your activities, look at them to see if any of them are affected by the new information you have learned. If they are, update them as needed with the new risks you have learned about.

D8: Close the 8D and thank the team

Once all these steps are done, the team should meet one last time to run through the results and check that everyone agrees they are complete. Close and sign off the documentation in whatever way your organization does these things. And thank the team, genuinely recognizing them for the improvements they have made in your system. Order pizza.

Strictly speaking that last bit isn't mandatory. But it is hard to go wrong by ordering pizza.

Another way to think about it

By Efbrazil - Own work, CC BY-SA 4.0, https://commons.
wikimedia.org/w/index.php?curid=102392470

There is no question that an 8D can be a lot of work. But it is a powerful tool. One way to think about it is to see the 8D process as a way to apply the scientific method to solving your organization's problems. At a high level, the steps are the same:

Observation / question: This corresponds to the statement of the problem in D2.
Research topic area: This corresponds to the data collection in D2 and the logical analysis of the data in D4.
Hypothesis: This corresponds to the list of possible corrective actions in D5.
Test with experiment: This corresponds to D6, where you implement one or more corrective actions and then check to see if they succeeded in eliminating the problem.
- Remember that the whole point of the scientific method is that we don't know how an experiment will turn out until we do it; that's why running an experiment teaches us something new.
- But it's the same with an 8D: we don't know whether our proposed corrective action will really correct anything until we try it. That's why there are so many sub-parts to D6: we have to allow for branching paths, depending on how the results turn out in the real world.
- And either way — regardless whether the first action we try (or the second, or the third) succeeds or fails — we learn something new, something we didn't know before.
Analyze data: Part of this analysis takes place in D6, where we evaluate whether we really did eliminate the problem (and, if not, why not). The rest of it takes place in D7, where we evaluate the broader implications of the new information we just learned: Where else do we risk seeing the exact same problem?
Report conclusions: This is where we wrap up the paperwork at the end of D8.

This particular cyclical representation of the scientific method fails to include a step for ordering pizza. But it's still a good thing to do.

Pragmatic Quality Blog

Thursday, January 13, 2022

Problem-solving: Anatomy of an 8D