Pragmatic Quality Blog: December 2024

Thursday, December 26, 2024

When metrics don't tell the whole story

We've spent this month talking about metrics: how to define them, and where they can go wrong. But it's also important to understand that metrics never tell the whole story. A numerical metric can be a powerful way to assess the performance of a complex process, by cutting through the fog of operational details to look at the results. But in the end it is still only one observation, and sometimes not the most important one.

Step back for a minute to consider what a metric is. A metric* is a value—typically a number—that answers a question about some organizational process. How many widgets does this machine produce per hour? What percentage of our customer orders are shipped on time? What percentage of customer shipments result in complaints? And so on.

This means that a metric is meaningful or useful only insofar as the question it answers is meaningful or useful. And that question rests on a set of assumptions about how the measured process works (or is supposed to work) and how it interacts with the world around it. As long as the assumptions are correct, the number is meaningful. Deprived of context, it's not.

Consider a couple of simple examples. We want to know How many widgets does this machine produce per hour? because we want to understand whether we will have enough stock to fill our orders. So we install a counter on the machine. But if the counter is out of order, the numbers on its display will be wrong. We still need to know the correct number, but our normal process—read the display and log what it says—may have to be replaced by a more manual counting process.

We want to know What percentage of our orders are shipped on time? because in general customers demand timely shipment. Late orders mean unhappy customers, and unhappy customers will start shopping with our competitors. But in some cases, timely delivery isn't the most important thing. Maybe we are artists or sculptors, who do priceless original work on special commission from wealthy patrons. On the whole, these patrons probably don't care exactly what date the order is shipped, so long as it is perfect when it is finally delivered. Once you change the context, the question and metric become meaningless.

In other words, numerical metrics are great so long as they are answering the right questions. But getting correct answers to wrong questions can easily steer you down the wrong path. Peter Drucker cites a couple of dramatic examples.**

"The thalidomide tragedy which led to the birth of so many deformed babies is a case in point. By the time doctors on the European continent had enough statistics to realize that the number of deformed babies born was significantly larger than normal—so much larger that there had to be a specific and new cause—the damage had been done....
"The Ford Edsel holds a similar lesson. All the quantitative figures that could possibly be obtained were gathered before the Edsel was launched. All of them pointed to its being the right car for the right market. The qualitative change—the shifting of American consumer-buying of automobiles from income-determined to taste-determined market-segmentation—no statistical study could possibly have shown. By the time this could be captured in numbers, it was too late—the Edsel had been brought out and had failed."

If you can find a way to supplement your quantitative metrics with some other (perhaps qualitative) way to assess how things are going—in best case, using a wholly different perspective—your overall understanding of the situation will be stronger.

This might sound like a narrow discussion inside Quality theory, but the same debate has been going on recently in the political arena over the state of the economy. Some people have pointed out that the normal quantitative metrics show the American economy to be in great shape. Others have countered that the economy is suffering, and that if the metrics don't agree then so much the worse for the metrics! Personally I have no idea what the economy is doing and I take no position in this argument. But it fascinates me to see this exact topic as a subject of intense public debate.

In brief, there is no reliable way to manage your organization on autopilot. Three years ago, I argued that there is no perfect process. In the same way, there are no perfect metrics. Process and metrics are useful tools, but you still have to pay attention, and to think hard about what you are doing.

__________

* in this context, at any rate

** Both of these examples are quoted from Peter Drucker, The Effective Executive (New York: HarperCollins, 1966, 1967, 1993), pp. 16-17.

Thursday, December 19, 2024

"What gets measured gets managed"—like it or not!

For the past couple of weeks we've been talking about metrics, and it is clear that they are central to most modern Quality systems. ISO 9000:2015 identifies "Evidence-based decision making" as a fundamental Quality management principle, stating (in clause 2.3.6.1): "Decisions based on the analysis and evaluation of data and information are more likely to produce desired results." ISO 9001:2015 (in clause 6.2.1) requires organizations to establish measurable quality objectives—that is, metrics—in order to monitor how well they are doing. We've all heard the slogan, "What gets measured, gets managed."

If you think about it, the centrality of quantitative metrics relies on a number of fundamental assumptions:

We assume that quantitative metrics are objective—in the sense that they are unbiased. This lack of bias makes them better than mere opinions.
We also assume that quantitative metrics are real, external, independent features of the thing we want to understand. This external independence makes them reliable as a basis for decisions.
And finally, we assume that quantitative metrics are meaningful: if the numbers are trending up (or down), that tells us something about what action we need to take next.

But each of these assumptions is weak.

Metrics are not necessarily unbiased. In fact, as we discussed last week, there is a sense in which every quantitative metric conceals some hidden bias. Since this is true for all metrics, the answer is not to replace your old metric with a better one. What is important is to understand the hidden bias, to correct for it when you interpret your results.
Metrics are not necessarily external or independent of the thing being measured. Think about measuring people. If they come to understand that you are using a metric as a target—maybe they get a bonus if the operational KPIs are all green next quarter—people will use their creativity to make certain that the KPIs are all green regardless of the real state of things. (See also this post here.)
And metrics can only be meaningful in a defined context. Without the context, they are just free-floating numbers, no more helpful than a will o' the wisp.

We discussed the first risk last week. I'll discuss the second risk in this post. And I'll discuss the third one next week.

Unhelpful optimization

I quoted above the slogan, "What gets measured, gets managed." But just a week ago, Nuno Reis of the University of Uncertainty pointed out in a LinkedIn post that this slogan is misleading, and that it was originally coined as a warning rather than an exhortation. Specifically, Reis writes:

It started with V. F. Ridgway’s 1956 quote: "What gets measured gets managed."
Yet, Ridgway was WARNING how metrics distort and damage organizations.
The FULL quote is:
"What gets measured gets managed—even when it's pointless to measure and manage it, and even if it harms the purpose of the organization to do so."*

The original source was a 1956 article by V. F. Ridgway called. "Dysfunctional consequences of performance measurements."** Ridgway's point is that a metric provides just a single view onto the thing you want to understand, but some people will always treat it uncritically, as the whole truth. This misunderstanding creates an opportunity for other people to exploit the metric by acting so that the numbers get better, even if the overall organization suffers for it. Examples include the following:***

"1. A case where public employment interviewers were evaluated based on the number of interviews. This caused the interviewers to conduct fast interviews, but very few job applicants were placed.
"2. A situation where investigators in a law enforcement agency were given a quota of eight cases per month. At the end of the month investigators picked easy fast cases to meet their quota. Some more urgent, but more difficult cases were delayed or ignored.
"3. A manufacturing example similar to the above situation where a production quota caused managers to work on all the easy orders towards the end of the month, ignoring the sequence in which the orders were received.
"4. Another case involved emphasis on setting monthly production records. This caused production managers to neglect repairs and maintenance.
"5. Standard costing is mentioned as a frequent source of problems where managers are motivated to spend a considerable amount of time and energy debating about how indirect cost should be allocated and attempting to explain the differences between the actual and standard costs."

You see the general point. In each case, a metric is defined in the hopes that it will drive organizational behavior in a good direction. But the people working inside the organization naturally want to score as well as possible, preferably without too much effort. So they use their creativity to find ways to boost the numbers.

Also, in case this discussion sounds familiar, we have seen these themes before. Once was in 2021, in this post here, where I argue that "There is no metric in the world that cannot be gamed." But the exact same point shows up in this post here from 2023, about systems thinking—where the fundamental insight is that if you design your operations and metrics in a lazy way, without thinking through what you are doing, you will incentivize your people to deliver bad service.

Pro tip: Don't do that.

Goodhart's law

Let me wrap up by referencing the webcomic xkcd. This one is about Goodhart's Law, that "When a measure becomes a target, it ceases to be a good measure." Of course the reasons behind Goodhart's Law are everything I've already said in this post. Here's what xkcd does with it:****

Meanwhile, I hope everyone has a great holiday season! I'll be back in a week to talk about the third assumption we make regarding metrics.

__________

* It seems that this formulation is from a summary of Ridgway's work by the journalist Simon Caulkin. See this article for references.

** Ridgway, V. F. 1956. Dysfunctional consequences of performance measurements. Administrative Science Quarterly 1(2): 240-247. See reprint available here, or summary available here.

*** These five examples are quoted from this summary here, by James R. Martin, Ph.D., CMA.

**** The xkcd website makes the following statement about permissions for re-use: "This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. This means you're free to copy and share these comics (but not to sell them). More details."

Thursday, December 12, 2024

The hidden bias inside metrics

Last week we talked about metrics, and about how—if you find you need to measure something where no metric has ever been established before—you can just make one up. Of course this is true, but you still have to be careful. Make sure you understand what you want the metric to tell you. The reason is that sometimes you can measure the same thing in two different ways, and each way conveys a hidden message or bias.

For example, suppose you are comparing different ways to travel from one place to another: walking, skateboarding, bicycling, driving, flying. And suppose you want to know which is the safest. How do you measure that?

It all depends which one you want to win. If you work for the airline industry, then you probably want to convince people that commercial air travel is the safest form of travel. That way, more people will choose to fly, and your business will grow. So in that case, you measure safety in terms of Number of fatal accidents per mile traveled.

It's a simple fact that commercial air travel has very few fatal accidents, so the numerator of that fraction will be very small. At the same time, flying is most practical when you want to cover long distances, so on the whole the denominator is very large. That means that the overall fraction will be very small indeed, and—sure enough!—the airline industry regularly advertises that flying is the safest way to travel.

But you could equally well approach the question from another direction. Suppose you ask: If something goes wrong, how much danger am I in? Using this metric, flying no longer leads the pack. If something goes wrong while you are walking—even if you are walking long distances—you likely need no more than a day's rest and a better pair of shoes. But if the airplane that you are on develops catastrophic engine failure at 35,000 feet, the odds are strongly against anyone walking away from the experience.

This is what I mean by the "hidden bias" in a metric. Because metrics are (by definition) objective and (generally) quantitative, we tend to assume that they are unbiased. But when you try to measure "Which form of travel is the safest?" flying comes out as either the best or the worst, depending which metric you choose.

Nor can you ask, "Well which one is the right metric to settle the question?" There is no "right" metric. Both of these metrics answer part of the question about safety. The real problem is that the question about the "safest form of travel" is badly posed. What are you really asking for? Do you want to know about the frequency or likelihood of serious problems? In that case, flying is the safest. Do you want to know about the lethality of serious problems? In that case, flying is the most dangerous. Before you choose a metric, you have to understand very exactly what you want it to tell you. In the same way, before you blindly accept any metric quoted by somebody else, think hard about what that metric is really measuring, and about why he chose to use it and not a different one.

Years ago, I saw a customer advocate on television exploding a metric in the most delightful way. Some brand of potato chips had come out with a new line, that advertised "Less Salt and Less Oil!" But a close analysis of the production process showed that actually a bag of the new chips contained—overall—more salt and more oil than a bag of their regular line. How could they get away with advertising "Less Salt and Less Oil"? When he challenged them they explained that they had made the potato chips smaller! Therefore—so they said—if you sit down with a plan to eat exactly ten potato chips (or some other definite number), you end up consuming less salt and less oil than if you had eaten ten of their regular chips. And of course the consumer advocate riposted with what's obvious, namely, that nobody ever sits down to eat a specific number of potato chips. In fact, he said, the only time he had ever seen anyone count out a specific number of potato chips was when he saw two eight-year-old boys dividing a bag between them. Otherwise, that's not what people do. So the metric was true as far as it went, but it was misleading.

The same thing is true of any other metric. Be it never so objective, it will silently push the conversation in one direction rather than another. When you choose a metric—or when you make one up, if you have to do that—make sure that it is pointing in a direction you want to go.

Thursday, December 5, 2024

"How hot is that pepper?": Adventures in measurement

We all know that measurement is important. But what if you want to measure something that has no defined metric?

The answer may be that you have to make something up. Look at the feature, or process, or event that you have in mind; determine its salient characteristics; and then decide how those can be best isolated and communicated. Often, the clearest communication is quantitative, in terms of numbers. In a few cases, you might find it simpler to communicate in binary terms (on/off), or qualitatively. But in all events, make sure that your distinctions are objective and repeatable.

The basic elements you have to define are:

system of measurement
unit of measure
sensor

And that's it! Once you know how you are going to check the thing (sensor) and what you are going to count (unit of measure), you can measure whatever you need.

This video explains the process by walking through the steps to establish the Scoville scale, which measures the hotness of chili peppers. It's quick and fun and less than two minutes long.