Thursday, December 26, 2024

When metrics don't tell the whole story

We've spent this month talking about metrics: how to define them, and where they can go wrong. But it's also important to understand that metrics never tell the whole story. A numerical metric can be a powerful way to assess the performance of a complex process, by cutting through the fog of operational details to look at the results. But in the end it is still only one observation, and sometimes not the most important one.

Step back for a minute to consider what a metric is. A metric* is a value—typically a number—that answers a question about some organizational process. How many widgets does this machine produce per hour? What percentage of our customer orders are shipped on time? What percentage of customer shipments result in complaints? And so on. 

This means that a metric is meaningful or useful only insofar as the question it answers is meaningful or useful. And that question rests on a set of assumptions about how the measured process works (or is supposed to work) and how it interacts with the world around it. As long as the assumptions are correct, the number is meaningful. Deprived of context, it's not.

Consider a couple of simple examples. We want to know How many widgets does this machine produce per hour? because we want to understand whether we will have enough stock to fill our orders. So we install a counter on the machine. But if the counter is out of order, the numbers on its display will be wrong. We still need to know the correct number, but our normal process—read the display and log what it says—may have to be replaced by a more manual counting process.

We want to know What percentage of our orders are shipped on time? because in general customers demand timely shipment. Late orders mean unhappy customers, and unhappy customers will start shopping with our competitors. But in some cases, timely delivery isn't the most important thing. Maybe we are artists or sculptors, who do priceless original work on special commission from wealthy patrons. On the whole, these patrons probably don't care exactly what date the order is shipped, so long as it is perfect when it is finally delivered. Once you change the context, the question and metric become meaningless.    

In other words, numerical metrics are great so long as they are answering the right questions. But getting correct answers to wrong questions can easily steer you down the wrong path. Peter Drucker cites a couple of dramatic examples.**

  • "The thalidomide tragedy which led to the birth of so many deformed babies is a case in point. By the time doctors on the European continent had enough statistics to realize that the number of deformed babies born was significantly larger than normal—so much larger that there had to be a specific and new cause—the damage had been done....
  • "The Ford Edsel holds a similar lesson. All the quantitative figures that could possibly be obtained were gathered before the Edsel was launched. All of them pointed to its being the right car for the right market. The qualitative change—the shifting of American consumer-buying of automobiles from income-determined to taste-determined market-segmentation—no statistical study could possibly have shown. By the time this could be captured in numbers, it was too late—the Edsel had been brought out and had failed."

If you can find a way to supplement your quantitative metrics with some other (perhaps qualitative) way to assess how things are going—in best case, using a wholly different perspective—your overall understanding of the situation will be stronger.

This might sound like a narrow discussion inside Quality theory, but the same debate has been going on recently in the political arena over the state of the economy. Some people have pointed out that the normal quantitative metrics show the American economy to be in great shape. Others have countered that the economy is suffering, and that if the metrics don't agree then so much the worse for the metrics! Personally I have no idea what the economy is doing and I take no position in this argument. But it fascinates me to see this exact topic as a subject of intense public debate. 

In brief, there is no reliable way to manage your organization on autopilot. Three years ago, I argued that there is no perfect process. In the same way, there are no perfect metrics. Process and metrics are useful tools, but you still have to pay attention, and to think hard about what you are doing. 

__________

in this context, at any rate  

** Both of these examples are quoted from Peter Drucker, The Effective Executive (New York: HarperCollins, 1966, 1967, 1993), pp. 16-17. 

    

Thursday, December 19, 2024

"What gets measured gets managed"—like it or not!

For the past couple of weeks we've been talking about metrics, and it is clear that they are central to most modern Quality systems. ISO 9000:2015 identifies "Evidence-based decision making" as a fundamental Quality management principle, stating (in clause 2.3.6.1): "Decisions based on the analysis and evaluation of data and information are more likely to produce desired results." ISO 9001:2015 (in clause 6.2.1) requires organizations to establish measurable quality objectives—that is, metrics—in order to monitor how well they are doing. We've all heard the slogan, "What gets measured, gets managed."

If you think about it, the centrality of quantitative metrics relies on a number of fundamental assumptions:

  • We assume that quantitative metrics are objective—in the sense that they are unbiased. This lack of bias makes them better than mere opinions.
  • We also assume that quantitative metrics are real, external, independent features of the thing we want to understand. This external independence makes them reliable as a basis for decisions. 
  • And finally, we assume that quantitative metrics are meaningful: if the numbers are trending up (or down), that tells us something about what action we need to take next.

But each of these assumptions is weak.

  • Metrics are not necessarily unbiased. In fact, as we discussed last week, there is a sense in which every quantitative metric conceals some hidden bias. Since this is true for all metrics, the answer is not to replace your old metric with a better one. What is important is to understand the hidden bias, to correct for it when you interpret your results. 
  • Metrics are not necessarily external or independent of the thing being measured. Think about measuring people. If they come to understand that you are using a metric as a target—maybe they get a bonus if the operational KPIs are all green next quarter—people will use their creativity to make certain that the KPIs are all green regardless of the real state of things. (See also this post here.)
  • And metrics can only be meaningful in a defined context. Without the context, they are just free-floating numbers, no more helpful than a will o' the wisp. 

We discussed the first risk last week. I'll discuss the second risk in this post. And I'll discuss the third one next week. 

Unhelpful optimization

I quoted above the slogan, "What gets measured, gets managed." But just a week ago, Nuno Reis of the University of Uncertainty pointed out in a LinkedIn post that this slogan is misleading, and that it was originally coined as a warning rather than an exhortation. Specifically, Reis writes:

It started with V. F. Ridgway’s 1956 quote: "What gets measured gets managed."

Yet, Ridgway was WARNING how metrics distort and damage organizations.

The FULL quote is:

"What gets measured gets managed—even when it's pointless to measure and manage it, and even if it harms the purpose of the organization to do so."*

The original source was a 1956 article by V. F. Ridgway called. "Dysfunctional consequences of performance measurements."** Ridgway's point is that a metric provides just a single view onto the thing you want to understand, but some people will always treat it uncritically, as the whole truth. This misunderstanding creates an opportunity for other people to exploit the metric by acting so that the numbers get better, even if the overall organization suffers for it. Examples include the following:***

"1. A case where public employment interviewers were evaluated based on the number of interviews. This caused the interviewers to conduct fast interviews, but very few job applicants were placed.

"2. A situation where investigators in a law enforcement agency were given a quota of eight cases per month. At the end of the month investigators picked easy fast cases to meet their quota. Some more urgent, but more difficult cases were delayed or ignored.

"3. A manufacturing example similar to the above situation where a production quota caused managers to work on all the easy orders towards the end of the month, ignoring the sequence in which the orders were received.

"4. Another case involved emphasis on setting monthly production records. This caused production managers to neglect repairs and maintenance.

"5. Standard costing is mentioned as a frequent source of problems where managers are motivated to spend a considerable amount of time and energy debating about how indirect cost should be allocated and attempting to explain the differences between the actual and standard costs."

You see the general point. In each case, a metric is defined in the hopes that it will drive organizational behavior in a good direction. But the people working inside the organization naturally want to score as well as possible, preferably without too much effort. So they use their creativity to find ways to boost the numbers.

Also, in case this discussion sounds familiar, we have seen these themes before. Once was in 2021, in this post here, where I argue that "There is no metric in the world that cannot be gamed." But the exact same point shows up in this post here from 2023, about systems thinking—where the fundamental insight is that if you design your operations and metrics in a lazy way, without thinking through what you are doing, you will incentivize your people to deliver bad service

Pro tip: Don't do that. 

Goodhart's law

Let me wrap up by referencing the webcomic xkcd. This one is about Goodhart's Law, that "When a measure becomes a target, it ceases to be a good measure." Of course the reasons behind Goodhart's Law are everything I've already said in this post. Here's what xkcd does with it:****


Meanwhile, I hope everyone has a great holiday season! I'll be back in a week to talk about the third assumption we make regarding metrics.

__________

* It seems that this formulation is from a summary of Ridgway's work by the journalist Simon Caulkin. See this article for references.   

** Ridgway, V. F. 1956. Dysfunctional consequences of performance measurements. Administrative Science Quarterly 1(2): 240-247. See reprint available here, or summary available here. 

*** These five examples are quoted from this summary here, by James R. Martin, Ph.D., CMA.   

**** The xkcd website makes the following statement about permissions for re-use: "This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 LicenseThis means you're free to copy and share these comics (but not to sell them). More details."     

      

Thursday, December 12, 2024

The hidden bias inside metrics

Last week we talked about metrics, and about how—if you find you need to measure something where no metric has ever been established before—you can just make one up. Of course this is true, but you still have to be careful. Make sure you understand what you want the metric to tell you. The reason is that sometimes you can measure the same thing in two different ways, and each way conveys a hidden message or bias.

For example, suppose you are comparing different ways to travel from one place to another: walking, skateboarding, bicycling, driving, flying. And suppose you want to know which is the safest. How do you measure that?

It all depends which one you want to win. If you work for the airline industry, then you probably want to convince people that commercial air travel is the safest form of travel. That way, more people will choose to fly, and your business will grow. So in that case, you measure safety in terms of Number of fatal accidents per mile traveled

It's a simple fact that commercial air travel has very few fatal accidents, so the numerator of that fraction will be very small. At the same time, flying is most practical when you want to cover long distances, so on the whole the denominator is very large. That means that the overall fraction will be very small indeed, and—sure enough!—the airline industry regularly advertises that flying is the safest way to travel.

But you could equally well approach the question from another direction. Suppose you ask: If something goes wrong, how much danger am I in? Using this metric, flying no longer leads the pack. If something goes wrong while you are walking—even if you are walking long distances—you likely need no more than a day's rest and a better pair of shoes. But if the airplane that you are on develops catastrophic engine failure at 35,000 feet, the odds are strongly against anyone walking away from the experience.

This is what I mean by the "hidden bias" in a metric. Because metrics are (by definition) objective and (generally) quantitative, we tend to assume that they are unbiased. But when you try to measure "Which form of travel is the safest?" flying comes out as either the best or the worst, depending which metric you choose.

Nor can you ask, "Well which one is the right metric to settle the question?" There is no "right" metric. Both of these metrics answer part of the question about safety. The real problem is that the question about the "safest form of travel" is badly posed. What are you really asking for? Do you want to know about the frequency or likelihood of serious problems? In that case, flying is the safest. Do you want to know about the lethality of serious problems? In that case, flying is the most dangerous. Before you choose a metric, you have to understand very exactly what you want it to tell you. In the same way, before you blindly accept any metric quoted by somebody else, think hard about what that metric is really measuring, and about why he chose to use it and not a different one.

Years ago, I saw a customer advocate on television exploding a metric in the most delightful way. Some brand of potato chips had come out with a new line, that advertised "Less Salt and Less Oil!" But a close analysis of the production process showed that actually a bag of the new chips contained—overall—more salt and more oil than a bag of their regular line. How could they get away with advertising "Less Salt and Less Oil"? When he challenged them they explained that they had made the potato chips smaller! Therefore—so they said—if you sit down with a plan to eat exactly ten potato chips (or some other definite number), you end up consuming less salt and less oil than if you had eaten ten of their regular chips. And of course the consumer advocate riposted with what's obvious, namely, that nobody ever sits down to eat a specific number of potato chips. In fact, he said, the only time he had ever seen anyone count out a specific number of potato chips was when he saw two eight-year-old boys dividing a bag between them. Otherwise, that's not what people do. So the metric was true as far as it went, but it was misleading.  

The same thing is true of any other metric. Be it never so objective, it will silently push the conversation in one direction rather than another. When you choose a metric—or when you make one up, if you have to do that—make sure that it is pointing in a direction you want to go. 

      

Thursday, December 5, 2024

"How hot is that pepper?": Adventures in measurement

We all know that measurement is important. But what if you want to measure something that has no defined metric?

The answer may be that you have to make something up. Look at the feature, or process, or event that you have in mind; determine its salient characteristics; and then decide how those can be best isolated and communicated. Often, the clearest communication is quantitative, in terms of numbers. In a few cases, you might find it simpler to communicate in binary terms (on/off), or qualitatively. But in all events, make sure that your distinctions are objective and repeatable.

The basic elements you have to define are:

  • system of measurement
  • unit of measure
  • sensor

And that's it! Once you know how you are going to check the thing (sensor) and what you are going to count (unit of measure), you can measure whatever you need.

This video explains the process by walking through the steps to establish the Scoville scale, which measures the hotness of chili peppers. It's quick and fun and less than two minutes long.


 
 

 

Thursday, November 28, 2024

Another take on 5-Whys

A couple months ago, I saw a fun cartoon on LinkedIn. It was attached to a perfectly sound article which you can find here. But the cartoon is what I wanted to share.


And of course this is right. Sometimes analysis sputters out. You might not have the data you need to get a first-class root cause. Other factors might interfere as well. This work can be hard.

It's still valuable to do, of course.

                

Thursday, November 21, 2024

How small is "small"?

Last week, I wrote about an organization I once knew that accomplished amazing results with almost no formal systems. Among other things, I said, "When the number of people is small, and when the average tenure in jobs is high, you can accomplish a lot of good work following the Everybody-Just-Knows methodology." It sounds promising.

But how small is "small"? At what size do you lose the magic of "smallness" and have to start putting systems into place?

It's sooner than you think.

A friend of mine works for a retail firm with around sixty-five employees—maybe more now and then for short-term or seasonal work. It's a "small company" in anybody's book. But a little while ago they ran into a situation that proved the need for some formal systems.

What happened is this. The company generates product information sheets for their products. These sheets are used by several departments internally, and they are also made available to customers. Some time ago, the Social Media team decided to update these information sheets to make them "easier to read": in practice this meant adding some colorful graphics and rearranging the text so there is more white space on the page. For sheets with too much text to rearrange, they just deleted anything they found boring.*

But the Sales and Support teams, who interact directly with customers to resolve problems, urgently needed all that deleted text. So they in turn just kept copies of the earlier versions, and continued to duplicate those outdated versions when they needed more. At no point did anyone escalate the issue to the responsible manager to get a ruling on how to proceed, because no manager is specifically in charge of the information sheets. They could have gone to the General Manager, but no one wanted to escalate the issue that high.

When my friend told me this story I tried to discuss it pleasantly. But inside I felt like I was looking at one of those pictures you see in magazines with the headline "How Many Mistakes Can You Find?" (1) Nobody was formally responsible for the information sheets. (2) The people updating the sheets didn't know the full list of interested parties who used them, nor (3) what those parties required. (4) The updated sheets were not sent out for review before they were approved. And (5) it was possible for people to keep and use old sheets instead of having to use the current ones. There might be more issues if I think about it for a while, but that list is a good start.

I don't blame the company. No company starts out with all these systems on Day One, and it is usually through episodes like this that they learn they need more than they've got. Also I wanted to tread a little lightly because the company wasn't a client. They'd never asked for my advice—my friend was just telling me about having a bad day at work. So I made a few suggestions for her to take back to the office, to see if she could nudge things in the right direction, and left it at that.

But how is this any different from the team I described last week? If that team could do consistently good work with no formal systems, why shouldn't we expect the same results here?

The easy answer is that there are no tricks in the world that will guarantee you do good work. Even a formal Quality Management System won't guarantee that. At most, the Quality techniques will help you avoid a lot of familiar known mistakes that otherwise recur over and over. So while I say that a small team can do good work without formal systems, that doesn't mean they will.   

Beyond that, there are two or three differences between the Repair Center I described last week and my friend's retail store. That is, there are two differences which definitely bear on this question, and a third which might.

  1. The retail store is small for a company, but it's nearly ten times the size of the Repair Center.
  2. The Repair Center all did the same kind of work, and they all sat together in one big room. The retail store has multiple departments that are spread out across the facility.
  3. Everyone in the Repair Center was an "old-timer." Most of them had worked in the same job for over a decade. The retail store has more of a mix of age and experience levels.

The retail store's problem with information sheets arose largely because some people (the Social Media team) didn't know the needs of others (the sales and support staff). The first two points above mean that could never have happened in the Repair Center. Everyone there knew what everyone else was working on—and what they needed, and almost what they were thinking—because they all did the same task and they had worked together for so long. The last point, about seniority, might have helped too, because one of the benefits of long experience is that you've already had plenty of years in which to make foolish mistakes: after a while you should have already committed most of the relevant mistakes, and learned from them.

So when I say that a small team can do good work without formal systems, there are a lot of other conditions that have to be in place instead. The team needs to be really small, small enough that all-to-all communication is easy and immediate. They have to be knowledgeable and experienced. There are personal characteristics that they have to have, as well: conscientiousness, focus, and a host of others. It's possible, but it's not easy.

If you want to grow larger, consider starting to put formal systems in place.

If someone retires, look closely at what he used to do. Often someone will start doing basic document or system maintenance on his own time, just because he sees that it needs to be done. When he leaves, it may be time to formalize it.

And remember the other extreme, just to keep everything in perspective: that you can have sophisticated formal systems in place and still make alarming mistakes

__________

* That's probably an exaggeration; but in light of what came next it might as well have been true.   

     

Thursday, November 14, 2024

Quality work out of Fibber McGee's closet

From time to time I've written about the question whether it's possible to achieve Quality work without having a formal Quality system. (Here is one recent post where I touch on it, for example.) In general my argument is that it is always possible, but on any large scale it becomes staggeringly difficult. From this point of view, all our Quality methods are (so to speak) just tips and tricks—gimmicks, if you will—to make it easier to get the results we want.

With this background in mind, let me tell you a story.

Once upon a time, I worked for a California startup that had just been acquired by a much-larger company headquartered in Europe. One of the new initiatives was to get us integrated into their ISO 9001 system. To kick-start that activity, Mother Company sent an Auditor out from the Old Country to do a complete review of our system.

There were a lot of mis-matches. The Auditor expected meetings to start on time, but company culture at the startup meant people usually drifted in about 10 minutes late. A lot of procedure documents were either missing or badly followed. The Auditor's opinion soured quickly, and didn't improve as the week went on.

By Dell Publications., Public Domain, Link
At one point he came to our Repair Center. This was part of the Customer Service organization, and it was just what it sounds like. Customers whose units had broken or were malfunctioning could send them in and we'd fix them. If the unit was still under warranty, the fix was free; otherwise, we'd quote a price. Also, the Repair Center sold a lot of extended warranties to customers who didn't want to haggle over a bill when they needed their unit back.

The Repair Center was housed in a space that was scarcely big enough. Cabinets were full of old equipment, with tools and supplies shoved in at odd angles. Technicians regularly had half-completed jobs teetering on the edges of their workspaces, put on hold because they had to contact the customer for clarification of a question; but meanwhile they had started work on another job. There might be more equipment under the desks. It all looked like Fibber McGee's hall closet

Here the Auditor drew a line. He stopped at the entrance of the Repair Center and stood stock still. He leaned in to look around, but he would not cross the threshold. I don't know if he was afraid he might dislodge something or trip over it. But he looked around, scowled, murmured quietly "This center will never meet ISO 17025," … and left.

From then on, whenever anyone asked him genially, "How's the audit going?" the Auditor talked about our Repair Center. He described as much of it as he had been able to see, and explained how many regulations it failed to meet. If anyone asked how much work he thought it would take to bring it up to snuff, he just shook his head sadly.

Sooner or later, word got to our Head of Operations. This fellow was a long-time employee of Mother Company, but he was an American; and he had been sent out shortly after Mother Company acquired us, to offer a guiding hand as we assimilated into the larger organization. So he asked for a meeting with the Auditor. I was the Auditor's official host, so I tagged along.

Head of Operations: I hear that you don't like our Repair Center. What's wrong with it?

Auditor: It's cluttered. It's disorganized. There is no control over the space or the tools or the work. I don't see how they can function.

Operations: Compared to what? Are you comparing it to the Customer Equipment Laboratory in the home office, back in the Old Country?

Auditor: Yes, exactly.

Operations: Well help me understand this. Because the Repair Center here is a profit center, that generates something like 25% of the company's annual revenue. Meanwhile that Laboratory you are comparing it to is a cost center, and every year those costs are significant.

Auditor: I'm not concerned with how the company organizes its departments financially. I'm looking at compliance and customer satisfaction.

Operations: Fine, but that's another point. Back on my desk I have a stack of customer complaints an inch thick, complaining about the Customer Equipment Laboratory in the Old Country. And in my files I have just as many notes from customers expressing gratitude for the quick and professional service they got from this Repair Center. So who's offering more customer satisfaction?

Auditor: ISO 17025 has clear requirements that laboratories have to meet, and this Repair Center doesn't even come close.

Operations: Does that standard apply to this Repair Center?

Auditor: I don't know, but I am going to check.

In the end, it all worked out. The Repair Center tidied up their space, labeled their cabinets, and adopted some rules about how to handle work if you have to interrupt it to wait for a customer callback. After some research, the Auditor determined that ISO 17025 did not apply to the Repair Center's work, and the tidying that they did was good enough for ISO 9001.

But how were they able to do such good work all along? Isn't tidiness important? Does it really not matter whether your storage cabinets are labeled?

Of course it matters. The key is that everyone who worked in the Repair Center had been there for a decade!* This was a small company, with a dedicated staff and very little turnover. They didn't bother to label the cabinets, because everybody just knew where everything was. They didn't write down a lot of their operational procedures because everybody just knew what to do. They didn't bother to label work-in-progress because everybody just knew who was working on what.

And this is a valid Quality strategy for some organizations. When the number of people is small, and when the average tenure in jobs is high, you can accomplish a lot of good work following the Everybody-Just-Knows methodology. The only problem is that this approach doesn't scale well, and it's hard to bring on anyone new when your employees start to retire or die. 

As soon as you start to grow, or as soon as you have to work with anyone new, the traditional Quality tools suddenly become a lot more useful. 

__________

* Strictly speaking I think the New Guy had been there only seven or eight years.   

        

Five laws of administration

It's the last week of the year, so let's end on a light note. Here are five general principles that I've picked up from working ...