Pragmatic Quality Blog: July 2023

Thursday, July 27, 2023

"What's the root cause of that risk?"

If you'll indulge me, here's one more post based on Greenlight Guru's "Risk Management True Quality Summit Series." Don't worry, I'm not going to summarize all the talks! But this one neatly pulled together two topics that we have discussed here at some length: risk and problem-solving. So I thought it deserved a mention.

Besides, the presenter—Peter Sebelius, of Medical Device HQ—was a lot of fun. He was voluble, energetic, enthusiastic ... and full of closely-reasoned arguments why some of the big authorities in his field are doing it all wrong. Delightful to listen to.*

We all know that the reason to identify risks is so that we can control and manage them: eliminate them if possible, or at least mitigate them if not. Sebelius's point is that you can't control a risk effectively until you understand it, and that means among other things understanding where it comes from.

What's unusual is the idea of doing a root-cause analysis on something that hasn't happened yet. We've talked fairly extensively about root causes before now (see the thread that started here and continued for a month or more) but always as a way to determine what caused a particular failure. In the case of an identified risk, the failure hasn't happened yet.

But you can still ask, "How does it happen that we are facing this risk in the first place?" If you can find a cause and correct it, there's a chance the risk will disappear. And that means we have eliminated it, which is what we want.

Sebelius gave a concrete example, which makes all this theory a lot clearer.

Suppose we are setting up the manufacturing line for a new product, and we find that part of the assembly involves two screws. These screws are different lengths, and if you mix them up you create a situation that could cause customer harm. So we need to eliminate (or at least reduce) the risk of getting them mixed up. Common approaches in manufacturing would include asking the operators to double-check the screws before inserting them (even though we know that "double-check your work" is one of the least effective ways to improve quality).
But Sebelius asks, "Why are we trying to solve this problem in the first place?" He argues that the root cause of the risk is that the product design was stupid to begin with. If the risk of harm from mixing up the screws really was that serious, any competent designer would make it impossible to mix them up. This might mean, for example:
Redesign the product so the screws can be the same length, and there is nothing to mix up.
Give the screws different diameters as well as different lengths, so it is physically impossible to put either one in the wrong hole.
Eliminate one screw, and hold the parts together in another way.
Eliminate both screws.
This is what he means by introducing root-cause analysis into risk management: figure out how it comes to be that you have to address this risk in the first place.

Does this mean that all risks should be addressed in design? Sebelius argues that a lot of them can be addressed there, and if you can address a risk in design that's the best place. But he also points out that the real world generally doesn't fall into the same neat categories we use in our documentation; so in practice the same risk might have to be listed (and addressed) in several phases during the product lifecycle.

There were a lot of other valuable points in the talk as well—including an explanation why chairs have four legs and not three**—but the concept of applying root cause analysis to risk management was absolutely worth getting up early for.***

__________

* He reminded me of someone I know, but never mind that. 😀

** Mathematically speaking, it is enough to anchor three points to fix a structure's position in space. So why don't chairs have three legs? The answer is that some do, but most have four. The fourth leg is a risk-control measure, to protect you in case one of the other legs breaks unexpectedly, or you set down the chair on an uneven surface.

*** Each morning the sessions started at 9:00am Eastern Time, and this talk was the first one of the day. But I'm on the West Coast, so for me it started at 6:00am. It was still 100% worth it!

Thursday, July 20, 2023

Risk management in audits

This week, Greenlight Guru is hosting a series of webinars called the "Risk Management True Quality Summit Series." The talks cover a whole range of topics related to risk management. As this post goes live, the series has a couple hours yet to run. You might be able still to catch the last talks live. Recordings will be made available through the Greenlight Guru Academy, though I'm not quite sure on what terms. Check it out.

I'm not really trying to post an advertisement for the series 😀 but the talks I have attended so far have been consistently good.

The last webinar on Tuesday was by John Thompson of Emergo by UL, on the application of risk management in audits. I've done quite a few audits over the years, and we've talked about them at some length in this blog before, so I was interested to see what he had to say. And while his examples were naturally chosen from the medical device industry, much of what he had to say was applicable to anyone.

He started by reminding us that ISO 19011:2018, clause 4(g), encourages the use of a "risk-based approach": "The risk-based approach should substantively influence the planning, conducting and reporting of audits in order to ensure that audits are focused on matters that are significant for the audit client, and for achieving the audit programme objectives." And then he talked through what this means in practice. If the standard asks us to apply a risk-based approach to the planning, conducting, and reporting of audits, this entails several things.

Scheduling audits

Are there areas of the organization with an unusually high number of complaints, or corrective actions?
Are there areas where the process KPIs are bad?
Are there areas where external bodies (such as your registrar, or some public authority like the FDA) have found nonconformities?
Are there areas which have implemented new processes?
Have there been recent corporate acquisitions?

All of these areas should get special attention when you schedule your next audit. (At the same time, don't get so carried away that you forget to carry out routine audits in the other areas often enough to meet your basic requirements.)

Selecting auditors

Do all of your auditors know how to audit?
Can they show you documented evidence?
Do they all understand sampling techniques?
Do you have enough auditors?

Also, have external bodies found nonconformities that your internal audits missed? This is a red flag 🚩🚩 that something is wrong. If this happens, investigate whether your regular internal auditors are going too easy on the organization. Maybe you can invite someone from the other plant in the next state to come audit you, while you go audit them. Sometimes a fresh pair of eyes can see things that you no longer see because they are too familiar.

Conducting audits

When you start asking questions, check if the people you audit are aware of risk as well.

Is this task controlled using risk?
Do you inspect the incoming goods from our high-risk suppliers any differently from the ones from our low-risk suppliers?
What risks are there in the process you are carrying out right now?
How do you control them?
Have you ever discovered a risk that no-one knew before? What did you do about it?

Reporting audits

When it's time to write the report, remember that you aren't the last person who is going to care about risks in this organization. Present your results in such a way that the next person can see what risks you discovered this time around.

One way is to highlight trending data across functions. An Opportunity for Improvement may look trivial by itself; but if there is a stream of nonconformities on exactly the same topic across half a dozen other organizations, it may be part of a bigger picture.

Or you can collect the results by function, to highlight which functions need the most ongoing attention. Thompson suggested a table like this to aggregate the overall risk in each area:

Notice that in this example, Purchasing actually has the fewest findings of any function listed. But since three of them are Major nonconformities, they still show the highest overall risk rating.

It was a good talk. If you missed it, check out the Greenlight Guru Academy to see if the recording is still available.

Thursday, July 13, 2023

Integrating risk management with Quality

How does risk management relate to Quality? There's obviously some connection. If you don't think about risks when designing a product, there's a chance it might hurt someone when it goes out to the field. And if a product hurts someone, then it's a bad product—not a quality one. But when you set up a Quality Management System and a product development process, it's not always obvious how to integrate the consideration of risk. Is there some magic spot on the flowchart where you can drop a box called "Risk" and just have done?

Last week I wrote about risk evaluation, taking my lead from a LinkedIn post by Etienne Nichols that focused on the medical device industry (where he works). I was delighted to get a detailed reply from Edwin Bills which clarified some technical points and generously offered guidance to further reading. I wasn't familiar with Bills's work at the time, but I looked around and found that he has had a long and productive technical career in Quality and Risk Management, especially (but not only) with respect to medical devices. He has also published multiple articles. One recent article addresses exactly the question I've asked here.

This article dates from May 2023, not quite two months ago, and is titled "The Intersection Of ISO 13485 And ISO 14971 Under The Proposed FDA QMSR." (You can find it here.) It focuses on the medical device industry, which is not exactly my wheelhouse, so I won't try to comment on the technical details. (Besides, if you are curious about those details you should read the article itself.) But Bills makes a number of broader points that are very valuable.

The first of these points sounds obvious at first, but has deep ramifications: there is a difference between product risks and business risks. In exactly the same way, there is a difference between a standard designed for regulatory compliance and one designed for process management.

With respect to the first distinction, product risks are in general a lot more detailed than business risks, and require a much more granular response. Risk analysis at the business level might highlight the need for a new policy or a change in strategy; risk analysis at the product level could identify dozens or hundreds of potential risks, of which let's say twenty or fifty are actionable. Then each of the actionable risks has to be analyzed in detail to determine exactly how to prevent it (or at least mitigate its effects).
With respect to the second distinction, Bills observes simply that the first edition of ISO 13485 (Quality management systems — Requirements for regulatory purposes) was explicitly based on ISO 9001:1994, because the 20-element structure of the latter was very useful in supporting regulatory compliance. By the time ISO 9001:2015 came out, ISO 13485 no longer copied its structure—because ISO 9001's process approach (together with the frequent use of the phrase "the organization shall determine") made the document altogether too vague to be much help in an FDA audit.

A second general point that Bills makes is that—to answer my question in the first paragraph—there is no magic spot on the flowchart where you can drop a box called "Risk" and just have done. Bills takes considerable pains in his article to trace out the interconnections between ISO 13485 and ISO 14971 (Application of risk management to medical devices). As always, you can find the details in his article (and he provides a map at this link). Here let me say simply that the connections are many and subtle; and while he is clear that you do not have to comply with ISO 14971 in order to comply with ISO 13485, it sure helps. Bills also walks his readers through the entire product life-cycle—from design through development and into production—to show exactly what role risk management has to play at each step of the way.

There is a third point that Bills brings out that is absolutely critical for any risk management methodology. Risk management has to be a living system! It is no good to do your risk analyses and then (as Bills says) simply put the documents in the file and forget about them. In the real world, your experience with the product will continue to grow: every hiccup in production and every customer complaint tell you something you didn't know the day before. And this means you have to pull your risk analyses out of the file, update them with the new information, and check to see how that changes your results.

For example, maybe some customer finds a way to misuse the product that you never thought of (customers are good at this) and as a result it breaks. (Or, far worse, hurts someone.) Now you know that the product won't withstand that kind of misuse. That's new data, so you have to add it to your risk analysis as an input and then see what changes.

There's a lot more, of course, and if you work in the medical device industry I strongly recommend the article. But these points should apply across all industries, so I am grateful that Bills spelled them out.

Thursday, July 6, 2023

What makes a risk "acceptable"?

A few weeks ago, Etienne Nichols published an extended post on LinkedIn about risk. Nichols works in the medical device industry, so risk is a critical topic for him. But of course everything we do has some kind of risk—even getting out of bed in the morning! At what point do we decide that the risk is "acceptable" so that we can move ahead?

Unsurprisingly there is a standard that discusses this, and Nichols references it. ISO 14971 specifically covers the "Application of risk management to medical devices," and Nichols quotes it in his long post. First he points out that, with respect to any kind of harm, Risk = Probability x Severity. But what is harm? In the context of medical devices, you'd expect that to mean harm to the patient, and of course it includes that among other things. But it turns out that the actual definition is a lot broader. According to ISO 14971, harm means damage or injury to human health, or damage to property, or damage to the environment. All of these have to be considered in a complete risk analysis.

Then what makes a risk acceptable? That has to be the benefit which is enabled by the risk, and this too is defined broadly. Nichols quotes the following clarification: "Benefits can include positive impact on clinical outcome, the patient's quality of life, outcomes related to diagnosis, positive impact from diagnostic devices on clinical outcomes, or positive impact on public health." This is a broad list, but it has to be. At the very least it has to be as broad as the range covered by the word harm. What is more, we all know that sometimes the benefits of an action land in a whole different area from its risks. If I drive my car to work, the risks are around things like the morning's traffic or the chance of an accident. But the benefits relate to my job, my coworkers, my customers, and my income. There is no simple arithmetical way to calculate that the benefits outweigh the risks, but a lot of people drive to work every day and are satisfied that it's the right thing to do. There must be some kind of intuitive "calculation" behind that decision, but it's not an arithmetical one.

In the end, Nichols concludes, "No risk is acceptable without the presence of some benefit. But when the stakes are high enough and the benefit is great enough, there's no risk that could be unacceptable.... [W]hen it's all said and done, the acceptability or unacceptability of a risk boils down to the benefit." And this, too, is logical. We all know that in wartime, soldiers risk their lives to take a hill or a position; but if by so doing they can help end the war, that benefit is great enough to make any risk worth it.

There is one way that this balance between risk and benefit can go badly wrong. That's when the parties who face the risk and the parties who decide on the action and reap the benefit are different people. This is called moral hazard, and its consequences are always bad.

For example, let's pretend that I sell a device that doesn't work. [I don't really.] Then I make money but the patients suffer. If I get away with it, what's to stop me doing it again? As long as I don't suffer any consequences, I might deem the patients' risks to be "acceptable."*

In the real world, this is why manufacturers are legally liable for their products—to prevent just that kind of fecklessness by giving the manufacturers some kind of skin in the game. ("Skin in the game" is more or less the conceptual opposite of moral hazard.**) And in the exact same vein, Nichols is careful to point out that ISO 14971 states "...this subclause [ISO14971 A.2.7.4] cannot be used to weigh residual risks against economic advantages or business advantages (i.e. for business decision making)."

It's an important qualification.

But so long as that qualification is understood and in place, it's the benefit that makes the risk acceptable. Nichols makes a good argument for this point.