Thursday, January 16, 2025

Calibration in wine-tasting

A couple months ago, I ran across an article about wine-tasting that I promptly lost and have not been able to find again. But it made some interesting points about calibration, so—as part of the current series on measurement—I'll try to reproduce the gist of it here. Since I can't find the article I can't give you a footnote to substantiate the factual claims I make about wine; but I think you'll agree that they are mostly common sense.

We all know that there is a difference between Good Wine and Bad Wine, and also that Good Wine generally costs more. But this article suggests that we recognize at least three levels: Terrible Wine, Good-Enough Wine, and Great Wine. And the differences between these levels are revealing.

As you climb from Terrible Wine to Good-Enough Wine, the price goes up by a bit but generally not by a lot. At the same time, the overall quality improves dramatically. Most wine drinkers can tell the difference between Terrible Wine and Good-Enough Wine.

But when you then climb from Good-Enough Wine to Great Wine, the variables shift. With this step the price may shoot up much higher. The wine gets a lot better too, but what is interesting is that not all wine-drinkers can taste the difference. More precisely, anyone can tell that the Great Wine doesn't taste quite the same as the Good-Enough wine. But unless you have a trained palate, you may not be able to distinguish the subtleties that make this bottle worth ten times as much as that bottle. Even so, those subtleties really do exist. But it generally takes a trained palate to recognize them.

What does this have to do with calibration? Everything.

In wine-tasting, your palate is the measuring instrument; the wine is the object to be measured; and its quality is the dimension in question. And the point is that the measuring instrument—your palate—has to be calibrated to meet the requirements of the measurement. But this calibration is of two kinds. 

  • On the one hand, you want to make sure no one is leaning on the scale; or in other words, that the measuring instrument reads zero when the inputs are (in fact) zero. 
  • On the other hand, you want to make sure that your measuring instrument is capable of the readings you need. If you need nanometer precision, don't use a yardstick. But if you are measuring carpet, don't use a nanomeasuring machine.

These principles apply exactly to the measurement of wine. 

  • The first requirement—that your palate should read zero when you aren't tasting anything—means that you shouldn't be distracted by other flavors. You can achieve this by taking a bite of something with a neutral flavor before sipping your wine.* 
  • The second requirement means that your palate has to be trained to match the use case you have in mind. 
    • If all you need is to find a table wine that will complement your hamburger or your Halloween candy,** you have to be able to tell the difference between Terrible Wine and Good-Enough Wine. And for that use case, a greater sensitivity might be wasted. 
    • On the other hand, if you are judging premium wines at the highest level—or if you are trying to re-create Alexandre Dumas's experience drinking Montrachet***—well, for that you need both sensitivity and training.

Once again, as always, what you need all depends on what you are trying to do.   

__________

* Note, for example, the care with which the Comte de Rueil offered his guests olives between each course to cleanse their palates before tasting the wine, in Dorothy Sayers, "The Bibulous Business of a Matter of Taste," in Lord Peter (New York: Harper & Row, pp.154-167.)   

** Yes, this is really a thing! See for example this blog post from October 2022.

*** Dumas once declared that Montrachet should be drunk only “on bended knee, with head bared.” It is supposed to be the best white wine in the world, or one of them.

   

Tuesday, January 14, 2025

FMEAs—Reducing the risk of failure

This morning, Manufacturing Tomorrow published my article, "FMEAs—Reducing the risk of failure." It's their article now so I won't post the text of it here, but you can find it by following the link. I hope you find it useful!


 

Thursday, January 9, 2025

Working with metrics that don't tell you much

We spent the whole month of December talking about metrics: how to create them, and how to avoid some common pitfalls associated with their use. Before we leave the subject, I want to address one more topic: What about when your metrics don't give you all that much information?

The first thing is to check how much information you really need. If you obviously need more than you are getting, that's almost like having no metric at all. Then maybe you need to create one, using whatever tools you have available. But not so fast. Sometimes even just a little data can be enough.

My son, Graham Mills, is a soil scientist, and recently he and I were talking about the kinds of measurements commonly used to classify Western rangeland. The Bureau of Land Management (BLM) has defined a strategy called Assessment, Inventory, and Monitoring (AIM). The standard source for this methodology is this publication,* which explains that:

Core methods generate indicators which represent the minimum information necessary to describe three key ecosystem attributes: soil and site stability, watershed function, and biotic integrity …. Nearly everything we value about ecosystems depends on these attributes. These core methods can also be used to generate many additional indicators that directly inform multiple management objectives, such as maintaining wildlife habitat, biodiversity conservation, producing forage, and supporting watershed health. Modifications to the core methods are discouraged as they limit the ability to combine and compare datasets, and thus describe ecosystem attributes at multiple scales.**

So far, so good. The catch, as Graham explained it to me, is that the actual measurements of soil health represent such a small fraction of the total characteristics of the soil that they are still maddeningly vague. It is, for example, not really possible to develop a solid theoretical understanding of the changes that have taken place over the years on a particular stretch of rangeland. To anyone with a scientific background, this limitation is frustrating.

Frustrating but not immobilizing. It turns out that soil scientists can still work with the AIM results.

The key is that the range of possible actions for restoring damaged or depleted rangeland is so very narrow. BLM scientists understand that rangeland is a biological system, and that systems—by definition!—are self-organizing and therefore unpredictable. So there are only a very few interventions permitted at all; and all of them are familiar and well understood. Plant this kind of bush here. Plant that kind of ground cover over there. If there is human garbage clogging a freshwater spring, remove the garbage. And so on. 

The list of approved actions is very short. And therefore a complete quantification of all possible soil characteristics is not needed. If the soil is seriously damaged, do this; if mildly damaged, do that; if already thriving, do a third thing—or maybe nothing at all. It turns out that that's enough to cover it.

As in so many other cases, the first step is to understand what you really need and want. Only then can you set about getting it.     

__________

Herrick, J.E., Jornada Experimental Range, 2018. Monitoring manual for grassland, shrubland, and savanna ecosystems. USDA - ARS Jordana Experimental Range, Las Cruces, NM.

** Ibid., p.1.

Photo credits: G. Mills, May 2021.      

Thursday, January 2, 2025

Why are you an auditor?—REDUX!

It's the New Year,* and I hope 2025 will be good for us all! In case any readers are still coping with a surfeit of champagne, let's start with something light.

A few months ago, I published a post asking "Why are you an auditor?" I told the story of Amir, who took a job as an auditor in order to learn how to manage a business. His plan was that, once he had learned how to run a business, he was going to quit auditing for entrepreneurship.

After writing this article, I posted notices pointing to it in a number of venues. One of these venues was the myASQ community, where Anthony DeMarinis of AJD Quality Solutions responded with the answers that he gives his students to the exact same question. I've wanted to share them with you ever since I read them, and this looks like a fine time to do so.

Top 10 Reasons to Become an Auditor

10  Get to see the big picture with exposure to Top Management

 9  Benchmark other areas and promote Out-Of-Box Thinking

 8  Good way to increase your personal knowledge and keep your job

 7  Paid time off and a diversion from your regular job

 6  Acquire transferable skills to prepare for the next layoff

 5  Experience with conflict resolution (which may come in handy at home)

 4  Free dinner at fancy restaurants with the Auditee

 3  Opportunity to network and look for another job

 2  You're not the Auditee and get to hassle someone else for a change

 1  Unlimited POWER!!!

Once again, I wish you all a very Happy New Year! Let's do good things in 2025.

  

__________

* OK, technically that was yesterday. But it's only been 2025 for a scant 32 hours so far.     

Thursday, December 26, 2024

When metrics don't tell the whole story

We've spent this month talking about metrics: how to define them, and where they can go wrong. But it's also important to understand that metrics never tell the whole story. A numerical metric can be a powerful way to assess the performance of a complex process, by cutting through the fog of operational details to look at the results. But in the end it is still only one observation, and sometimes not the most important one.

Step back for a minute to consider what a metric is. A metric* is a value—typically a number—that answers a question about some organizational process. How many widgets does this machine produce per hour? What percentage of our customer orders are shipped on time? What percentage of customer shipments result in complaints? And so on. 

This means that a metric is meaningful or useful only insofar as the question it answers is meaningful or useful. And that question rests on a set of assumptions about how the measured process works (or is supposed to work) and how it interacts with the world around it. As long as the assumptions are correct, the number is meaningful. Deprived of context, it's not.

Consider a couple of simple examples. We want to know How many widgets does this machine produce per hour? because we want to understand whether we will have enough stock to fill our orders. So we install a counter on the machine. But if the counter is out of order, the numbers on its display will be wrong. We still need to know the correct number, but our normal process—read the display and log what it says—may have to be replaced by a more manual counting process.

We want to know What percentage of our orders are shipped on time? because in general customers demand timely shipment. Late orders mean unhappy customers, and unhappy customers will start shopping with our competitors. But in some cases, timely delivery isn't the most important thing. Maybe we are artists or sculptors, who do priceless original work on special commission from wealthy patrons. On the whole, these patrons probably don't care exactly what date the order is shipped, so long as it is perfect when it is finally delivered. Once you change the context, the question and metric become meaningless.    

In other words, numerical metrics are great so long as they are answering the right questions. But getting correct answers to wrong questions can easily steer you down the wrong path. Peter Drucker cites a couple of dramatic examples.**

  • "The thalidomide tragedy which led to the birth of so many deformed babies is a case in point. By the time doctors on the European continent had enough statistics to realize that the number of deformed babies born was significantly larger than normal—so much larger that there had to be a specific and new cause—the damage had been done....
  • "The Ford Edsel holds a similar lesson. All the quantitative figures that could possibly be obtained were gathered before the Edsel was launched. All of them pointed to its being the right car for the right market. The qualitative change—the shifting of American consumer-buying of automobiles from income-determined to taste-determined market-segmentation—no statistical study could possibly have shown. By the time this could be captured in numbers, it was too late—the Edsel had been brought out and had failed."

If you can find a way to supplement your quantitative metrics with some other (perhaps qualitative) way to assess how things are going—in best case, using a wholly different perspective—your overall understanding of the situation will be stronger.

This might sound like a narrow discussion inside Quality theory, but the same debate has been going on recently in the political arena over the state of the economy. Some people have pointed out that the normal quantitative metrics show the American economy to be in great shape. Others have countered that the economy is suffering, and that if the metrics don't agree then so much the worse for the metrics! Personally I have no idea what the economy is doing and I take no position in this argument. But it fascinates me to see this exact topic as a subject of intense public debate. 

In brief, there is no reliable way to manage your organization on autopilot. Three years ago, I argued that there is no perfect process. In the same way, there are no perfect metrics. Process and metrics are useful tools, but you still have to pay attention, and to think hard about what you are doing. 

__________

in this context, at any rate  

** Both of these examples are quoted from Peter Drucker, The Effective Executive (New York: HarperCollins, 1966, 1967, 1993), pp. 16-17. 

    

Thursday, December 19, 2024

"What gets measured gets managed"—like it or not!

For the past couple of weeks we've been talking about metrics, and it is clear that they are central to most modern Quality systems. ISO 9000:2015 identifies "Evidence-based decision making" as a fundamental Quality management principle, stating (in clause 2.3.6.1): "Decisions based on the analysis and evaluation of data and information are more likely to produce desired results." ISO 9001:2015 (in clause 6.2.1) requires organizations to establish measurable quality objectives—that is, metrics—in order to monitor how well they are doing. We've all heard the slogan, "What gets measured, gets managed."

If you think about it, the centrality of quantitative metrics relies on a number of fundamental assumptions:

  • We assume that quantitative metrics are objective—in the sense that they are unbiased. This lack of bias makes them better than mere opinions.
  • We also assume that quantitative metrics are real, external, independent features of the thing we want to understand. This external independence makes them reliable as a basis for decisions. 
  • And finally, we assume that quantitative metrics are meaningful: if the numbers are trending up (or down), that tells us something about what action we need to take next.

But each of these assumptions is weak.

  • Metrics are not necessarily unbiased. In fact, as we discussed last week, there is a sense in which every quantitative metric conceals some hidden bias. Since this is true for all metrics, the answer is not to replace your old metric with a better one. What is important is to understand the hidden bias, to correct for it when you interpret your results. 
  • Metrics are not necessarily external or independent of the thing being measured. Think about measuring people. If they come to understand that you are using a metric as a target—maybe they get a bonus if the operational KPIs are all green next quarter—people will use their creativity to make certain that the KPIs are all green regardless of the real state of things. (See also this post here.)
  • And metrics can only be meaningful in a defined context. Without the context, they are just free-floating numbers, no more helpful than a will o' the wisp. 

We discussed the first risk last week. I'll discuss the second risk in this post. And I'll discuss the third one next week. 

Unhelpful optimization

I quoted above the slogan, "What gets measured, gets managed." But just a week ago, Nuno Reis of the University of Uncertainty pointed out in a LinkedIn post that this slogan is misleading, and that it was originally coined as a warning rather than an exhortation. Specifically, Reis writes:

It started with V. F. Ridgway’s 1956 quote: "What gets measured gets managed."

Yet, Ridgway was WARNING how metrics distort and damage organizations.

The FULL quote is:

"What gets measured gets managed—even when it's pointless to measure and manage it, and even if it harms the purpose of the organization to do so."*

The original source was a 1956 article by V. F. Ridgway called. "Dysfunctional consequences of performance measurements."** Ridgway's point is that a metric provides just a single view onto the thing you want to understand, but some people will always treat it uncritically, as the whole truth. This misunderstanding creates an opportunity for other people to exploit the metric by acting so that the numbers get better, even if the overall organization suffers for it. Examples include the following:***

"1. A case where public employment interviewers were evaluated based on the number of interviews. This caused the interviewers to conduct fast interviews, but very few job applicants were placed.

"2. A situation where investigators in a law enforcement agency were given a quota of eight cases per month. At the end of the month investigators picked easy fast cases to meet their quota. Some more urgent, but more difficult cases were delayed or ignored.

"3. A manufacturing example similar to the above situation where a production quota caused managers to work on all the easy orders towards the end of the month, ignoring the sequence in which the orders were received.

"4. Another case involved emphasis on setting monthly production records. This caused production managers to neglect repairs and maintenance.

"5. Standard costing is mentioned as a frequent source of problems where managers are motivated to spend a considerable amount of time and energy debating about how indirect cost should be allocated and attempting to explain the differences between the actual and standard costs."

You see the general point. In each case, a metric is defined in the hopes that it will drive organizational behavior in a good direction. But the people working inside the organization naturally want to score as well as possible, preferably without too much effort. So they use their creativity to find ways to boost the numbers.

Also, in case this discussion sounds familiar, we have seen these themes before. Once was in 2021, in this post here, where I argue that "There is no metric in the world that cannot be gamed." But the exact same point shows up in this post here from 2023, about systems thinking—where the fundamental insight is that if you design your operations and metrics in a lazy way, without thinking through what you are doing, you will incentivize your people to deliver bad service

Pro tip: Don't do that. 

Goodhart's law

Let me wrap up by referencing the webcomic xkcd. This one is about Goodhart's Law, that "When a measure becomes a target, it ceases to be a good measure." Of course the reasons behind Goodhart's Law are everything I've already said in this post. Here's what xkcd does with it:****


Meanwhile, I hope everyone has a great holiday season! I'll be back in a week to talk about the third assumption we make regarding metrics.

__________

* It seems that this formulation is from a summary of Ridgway's work by the journalist Simon Caulkin. See this article for references.   

** Ridgway, V. F. 1956. Dysfunctional consequences of performance measurements. Administrative Science Quarterly 1(2): 240-247. See reprint available here, or summary available here. 

*** These five examples are quoted from this summary here, by James R. Martin, Ph.D., CMA.   

**** The xkcd website makes the following statement about permissions for re-use: "This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 LicenseThis means you're free to copy and share these comics (but not to sell them). More details."     

      

Thursday, December 12, 2024

The hidden bias inside metrics

Last week we talked about metrics, and about how—if you find you need to measure something where no metric has ever been established before—you can just make one up. Of course this is true, but you still have to be careful. Make sure you understand what you want the metric to tell you. The reason is that sometimes you can measure the same thing in two different ways, and each way conveys a hidden message or bias.

For example, suppose you are comparing different ways to travel from one place to another: walking, skateboarding, bicycling, driving, flying. And suppose you want to know which is the safest. How do you measure that?

It all depends which one you want to win. If you work for the airline industry, then you probably want to convince people that commercial air travel is the safest form of travel. That way, more people will choose to fly, and your business will grow. So in that case, you measure safety in terms of Number of fatal accidents per mile traveled

It's a simple fact that commercial air travel has very few fatal accidents, so the numerator of that fraction will be very small. At the same time, flying is most practical when you want to cover long distances, so on the whole the denominator is very large. That means that the overall fraction will be very small indeed, and—sure enough!—the airline industry regularly advertises that flying is the safest way to travel.

But you could equally well approach the question from another direction. Suppose you ask: If something goes wrong, how much danger am I in? Using this metric, flying no longer leads the pack. If something goes wrong while you are walking—even if you are walking long distances—you likely need no more than a day's rest and a better pair of shoes. But if the airplane that you are on develops catastrophic engine failure at 35,000 feet, the odds are strongly against anyone walking away from the experience.

This is what I mean by the "hidden bias" in a metric. Because metrics are (by definition) objective and (generally) quantitative, we tend to assume that they are unbiased. But when you try to measure "Which form of travel is the safest?" flying comes out as either the best or the worst, depending which metric you choose.

Nor can you ask, "Well which one is the right metric to settle the question?" There is no "right" metric. Both of these metrics answer part of the question about safety. The real problem is that the question about the "safest form of travel" is badly posed. What are you really asking for? Do you want to know about the frequency or likelihood of serious problems? In that case, flying is the safest. Do you want to know about the lethality of serious problems? In that case, flying is the most dangerous. Before you choose a metric, you have to understand very exactly what you want it to tell you. In the same way, before you blindly accept any metric quoted by somebody else, think hard about what that metric is really measuring, and about why he chose to use it and not a different one.

Years ago, I saw a customer advocate on television exploding a metric in the most delightful way. Some brand of potato chips had come out with a new line, that advertised "Less Salt and Less Oil!" But a close analysis of the production process showed that actually a bag of the new chips contained—overall—more salt and more oil than a bag of their regular line. How could they get away with advertising "Less Salt and Less Oil"? When he challenged them they explained that they had made the potato chips smaller! Therefore—so they said—if you sit down with a plan to eat exactly ten potato chips (or some other definite number), you end up consuming less salt and less oil than if you had eaten ten of their regular chips. And of course the consumer advocate riposted with what's obvious, namely, that nobody ever sits down to eat a specific number of potato chips. In fact, he said, the only time he had ever seen anyone count out a specific number of potato chips was when he saw two eight-year-old boys dividing a bag between them. Otherwise, that's not what people do. So the metric was true as far as it went, but it was misleading.  

The same thing is true of any other metric. Be it never so objective, it will silently push the conversation in one direction rather than another. When you choose a metric—or when you make one up, if you have to do that—make sure that it is pointing in a direction you want to go. 

      

Five laws of administration

It's the last week of the year, so let's end on a light note. Here are five general principles that I've picked up from working ...