Thursday, June 20, 2024

Think about your product before you build it

Earlier this year a friend of mine was traveling, and her suitcase broke. The handle snapped in two places. She was already past the deadline for returning it, but she left a detailed review on Amazon.


This suitcase is a great size and weight and I like the zippered top compartment that prevents contents from falling out when the suitcase is opened. But unfortunately, the handle failed during the second trip I used it on. The design looks flimsy—the handle is held on by a single small screw on each side of the mount, and the mount is a flimsy plastic piece. It appears unrepairable; and even though it has been less than 5 months, I am outside the return window and Amazon will not refund my money. I will not purchase this suitcase again and I will look for a suitcase that has a more durable construction and a decent warranty.

Of course I sympathized with her bad luck. But at the same time—naturally enough—I started to wonder, How did this happen? A failure like this can't be waved away as a random accident or an Act of God. Surely the company who designed and manufactured these suitcases could have seen this coming, as soon as they selected "flimsy plastic" for the handle and chose to anchor it with "a single small screw on each side of the mount." Even so, I guess it could have been fine if all you carried in it were marshmallows. But it was sold as a suitcase, not a marshmallow-carrier. And many people pack their suitcases full. (I know my friend does!)

What should this company have done differently?

That's simple: as part of their design process, before they went into production, they should have carried out a Failure Mode and Effects Analysis, universally abbreviated FMEA.

The point of an FMEA is to avoid exactly this problem. 

  • Look at your product design, and think through—in advance!—all the ways it can possibly fail. 
  • Once you've collected a list of all foreseeable failures, go back and update the design as necessary to eliminate them. 
  • Now with your updated design, update your FMEA to see if you've introduced any new failures. 
  • Rinse and repeat.

Of course you can't do this forever. At some point you have to exit the design cycle and move into production. Also, you might identify some failure modes which yes, are theoretically possible, but they are highly unlikely. Maybe you've designed an umbrella which will protect you just fine from rain, but in case of an alien invasion from Mars it won't protect against ray guns. On the other hand, the odds of an invasion from Mars are pretty slim. One way or another, then, you need a criterion for when to let it go.

The answer is to assign every possible failure a Risk Priority Number, or RPN. This number is most commonly calculated based on three other numbers that you assign first. (The method here is an extension of the method for risk prioritization we discussed in the post "Basic risk management.")

  • Evaluate the probability (P) of the failure happening, typically on a scale from 1-5. 
    • 1 means that the failure is extremely unlikely, or virtually impossible. 
    • 5 means that the failure is frequent or almost inevitable.
  • Evaluate the severity (S) of the damage caused in case the failure does happen, again on a scale from 1-5. 
    • Naturally, the scale depends on knowing, "What's the worst that could happen?" If the worst outcome is that the product stops working, that's a 5. But if there's also a chance that somebody could get hurt, obviously that's even worse than that the product shuts down.
    • 1 means that even if the failure happens, there is no effect on reliability or safety.
    • 5 means that if the failure happens, the results are catastrophic. The product stops working, and—if there is any possibility for people to get hurt—people get badly hurt.  
  • Evaluate the detectability (D) of the failure, on a scale from 1-5. 
    • The idea is that if it's obvious something has gone wrong, the user will put the product down before anything bad happens. That's why car manufacturers design your brakes to make a terrible noise when the brake pads are getting thin—so you'll know it's time to replace them. But hidden problems can catch you unawares.
    • 1 means that you are certain to detect the problem in time.
    • 5 means that the problem will be invisible to users or even regular maintenance personnel.
  • Then your RPN = P x S x D.

Now every possible failure in your list has an RPN between 1 and 125 (= 5 x 5 x 5). The next step is that you have to assign a threshold value, call it N. Then the rule is that you have to correct every possible failure whose RPN is greater than N. When the RPN is less than N, you leave it alone.

Do you see how this solves the two problems I identified above, where you get stuck in a Design loop forever?

  • It's true that after you carry out your FMEA, you go back into design to correct all the failure modes that scored worse than your threshold N. And it's true that after you've redesigned the product, you should redo the FMEA to see if you introduced any new errors (and also to check that you really did prevent the ones you tried to prevent). But you don't stay in this loop forever. As soon as all the failure modes on your list have an RPN less than N, you are free to move on to the next step.
  • Also, it's unlikely that you will end up trying to protect against Martian ray-guns. The odds of an invasion from Mars pretty clearly deserve a probability rating of P = 1. So even if S = D = 5, your final RPN will be only 25. And it's likely that your threshold is higher than that.

How do you decide where to set your threshold N? This is a judgement call, and there is always a risk that the decision might be corrupted by someone pushing to set it in the wrong place. For example, someone might urge the team to set the value too high, so that they don't have to spend time preventing foreseeable problems. The best advice I can give is to get suggestions from stakeholders across the organization—for example, from Customer Service and Manufacturing, as well as Design—and to use honest common sense. Most of the time, when you list all your possible failures in order of RPN (from worst down to best), it will be obvious that the risks at the top of the list are terrible, and the ones at the bottom of the list are inconsequential. And often it will be equally obvious where to draw the line between them. There might be a small handful that you have to discuss because they are close to the line on one side or the other, but usually there aren't many. 

If The Suitcase Company had carried out an FMEA on their suitcase design, would that have helped my friend? I think so. 

  • Assuming that their engineers knew their job, they should have calculated that the probability of a failure in the handle was pretty likely once the suitcase was full.
    So let's say P = 4.
  • Since suitcases typically don't present a big safety risk to users, the relevant measure for severity would be whether the suitcase was still usable after the handle broke; and the answer is "Mostly no."
    So let's say S = 4
  • And the suitcase gave no warning signs before the handle suddenly snapped, which argues for the worst score for detectability.
    So let's say D = 5.
  • Then the RPN for the failure "Handle could snap" would be 4 x 4 x 5 = 80.

Ah, but where was their threshold? Of course I don't know. But it seems to me it should have been lower than 80. 

                

No comments:

Post a Comment

Five laws of administration

It's the last week of the year, so let's end on a light note. Here are five general principles that I've picked up from working ...