Thursday, March 28, 2024

A podcast on Boeing!

This week I had another chance to sit down with Kyle Chambers of Texas Quality Assurance, this time to talk about Boeing. Like me, Kyle has had a series of episodes dealing with Boeing's troubles in the last year, and he always brings a refreshing and very practical energy to all Quality topics. I start off talking about the FAA report that I discussed here last week; but in the course of the discussion we also cover why Boeing should hire TQA to revamp their training programs, and how to make safety classes matter to people who don't want to be there.

Please join us!

You can find the podcast version here: #QualityMatters episode 175.

Or there's a version on YouTube that also includes video, which you can find here:



Leave me a comment to let me know your thoughts! 

          

Thursday, March 21, 2024

What did the FAA find?

It's all very well to sit snugly behind a keyboard and criticize Boeing's safety culture (as I have done in a number of posts this spring, for example here and here). But how much of this is just talk, and how much is based on hard data? Has anyone done the hard work to sit down with Boeing and study their culture in detail? Maybe an exercise like that could tell us something useful.

In fact, a special Expert Panel completed just such a study last month. These experts were appointed by the Federal Aviation Administration (FAA) and began to meet a year ago, at the beginning of March, 2023. They wrapped up their investigation in February 2024 after spending a full year on it. The team reviewed 7 surveys and more than 100 policies and procedures, comprising over 4000 pages. They interviewed more than 250 people across 6 locations. In the end they issued 27 findings and 53 recommendations. You can find the full report online here, and the New York Times has an article about it here

The report is devastating. 

More exactly, it's written in the bland bureaucratic language that is mandatory for reports like this. There are no bold headlines screaming "J'Accuse!" But I have been auditing since 1996, and I cannot remember ever reading—much less writing!—a report about a fully functioning organization* that painted in such broad strokes a picture of a management system floating so loose from its moorings.

Background and summary

The Expert Panel was formed in accordance with the provisions of the 2020 Aircraft Certification, Safety, and Accountability Act (ACSAA), Pub. L. 116-260, Div. V, § 103, which requires review of organizations that hold an Organization Designation Authorization (ODA) from the FAA. An ODA is the arrangement by which the FAA delegates certain Boeing employees to inspect Boeing's own work, on behalf of the FAA, so that the FAA does not have to assign their own people. The idea seems to be at least in part that there are a lot of inspections which are mandated by airworthiness regulations, and if all of them had to be carried out by FAA personnel then the FAA's staff and budget would have to be significantly increased. 

If you think it sounds crazy to ask a company to inspect its own work when there are serious safety risks at stake, … well, you can look up the text of the 2005 rule (70 FR 59932) establishing ODAs in the Federal Register; the "Background" section of that document explains how the idea grew incrementally over time as a way to cut down the long delays caused by airworthiness inspections. But the FAA still retains oversight of the whole process—naturally, right?—which is why the 2020 law referenced above requires all ODA holders explicitly "to adopt safety management systems (SMS) consistent with international standards and practices," and also directs the FAA "to review The Boeing Company’s ODA, safety culture, and capability to perform FAA-delegated functions." (Reference.)    

When the Expert Panel issued their report, they summarized their findings under four general headings:

  • Boeing's safety culture, where they found a "disconnect" between what they heard from senior management and what they heard from the rank and file;
  • Boeing's SMS, which was structured to reflect all the applicable standards perfectly but which appeared to have been glued on top of the organization with library paste;
  • Boeing's ODA management structure, which the Panel conceded had been recently reorganized to make it harder for the company to retaliate against an employee finding violations while acting in the name of the FAA (but "harder" still doesn't mean "impossible");
  • Other topics.

In the remainder of this post I will highlight and discuss some of the specific findings and other observations. (Sometimes I will indent my comments in blue, when I think it helps to distinguish my remarks from those of the Panel.)

Boeing's safety culture

The basic observation here is that Boeing has defined and rolled out a formal, written safety culture, but most employees don't really understand it. (Sec. 3.3) Concretely:

  • Many employees, when interviewed, didn't know about "Boeing's enterprise-wide safety culture efforts, nor its purpose and procedures." (Sec. 4.1, #1)
  • Even employees who knew the terminology of the safety culture couldn't use it in a sentence. (Sec. 4.1, #2)
  • Some Boeing sites have good, "confidential, non-punitive reporting systems" in place—but not all of them. (Sec. 4.1, #3)
  • Managers can investigate reports in their own reporting chain, which means they risk not being impartial. (Sec. 4.1, #4)
  • Employees don't know which reporting system to use for safety problems. Employees don't really trust any of the reporting systems, and prefer to report safety problems to their bosses. Employees especially don't trust the anonymity of the "preferred system." Employees do not (reliably) get informed of the outcome when they do report through these systems. (Sec. 4.1, #5)
    • My comment: When you first hear it, "reporting safety problems to your manager" doesn't sound like a bad idea. (Although naturally people who report problems should still hear back how they were dispositioned, or they'll start to think that reporting is a waste of time.) The reason that "reporting safety problems to your manager" can become a problem is that ….
  • When employees report safety problems to their managers, it's often done verbally. So there is no way to know if any particular problem ever made it into the reporting system. And if a problem didn't get into the system, there's no way to track whether it was ever analyzed or fixed. (Sec. 4.1, #6) 

Boeing's SMS

Grigory Potemkin,
architect of the system?
The Panel makes a number of high-level observations about Boeing's SMS, before diving into the details. Among these observations are the following:

  • All the SMS documents are new, and there is no traceability to the changes from what came before. (Sec. 3.4, para. 4)
  • Most of the SMS documents cover general conduct and do not translate to the concrete working level. (Sec. 3.4, para. 5)
  • Many employees don't really understand the elements of the SMS, or else they think it is a management fad that won't stick around. (Sec. 3.4, para. 10) 
  • Many employees point out that Boeing already had a detailed safety system before the SMS was implemented—so why do we need this new one now? (In fact the old system is still referenced in many procedure documents.) (Sec. 3.4, para. 11)
  • Boeing requires employees to take safety training classes, but doesn't test whether they learned anything. (Sec. 3.4, para. 13)

In other words, the Panel says that Boeing's shiny new SMS—which complies perfectly with all the relevant requirements and standards—is a Potemkin system

After those general observations, the specific findings might be an anticlimax, but here are a few of them:

  • The complexity of the SMS documentation, and "the constant state of document changes," make it hard for employees to understand it. (Sec. 4.2, #10)
  • Boeing uses an SMS dashboard to track safety goals, but employees (and some managers) don't understand what it is or how to use it. (Sec. 4.2, #12)
  • There are different tracking systems for the SMS and for the legacy safety systems, and many people are confused by them. (Sec. 4.2, #12, cont'd.)
  • Since Boeing has kept all the legacy safety systems in place, employees across the company don't trust that the new SMS will last long. (Sec. 4.2, #13)
  • Boeing has procedures on how to evaluate safety-relevant decisions, but there's nothing to explain how to tell which business decisions count as safety-relevant. (Sec. 4.2, #14)

In other words, employees don't understand the SMS and they have no motivation to learn it.

Boeing's ODA management structure

The Panel's general observation about the ODA program is that it is getting harder to fill, because participating inspectors (called Unit Members, or UMs) are retiring faster than new ones are being brought onboard. (Sec. 3.5, paras. 4-6; sec. 4.3, #18)

But the detailed findings have to do mostly with the risk that UMs could fear retaliation for speaking out about problems:

  • Boeing has not eliminated the possibility of retaliation when UMs raise safety concerns, and some UMs have experienced what looks like retaliation. Other UMs are not willing to help or step in, and their help is rejected as interference. (Sec. 4.3, #16)
  • Boeing says they took steps to make sure the ODA program is working correctly, but cannot provide proof. (Sec. 4.3, #17)
    • In which case, did they really do anything?
  • Supposedly Boeing has changed the ODA organizational structure, but nobody knows how. Employees still report to their old managers. Procedures are still written around the old structure. (Sec. 4.3, #19) 

There are some other smaller findings as well.

Other topics

Of the findings classified as "Other matters," the two that concern me the most state (in different ways) that input from pilots is treated inconsistently: if it comes into Executive A, it is treated seriously and addressed; but if it comes into Executive B, it might get lost or forgotten. (Sec. 4.4, #23 and #24) Less alarming are some technical points about how to handle the relationship between Boeing and the FAA in the future.

But a couple of the other general observations are worth noting.

Right at the beginning, Boeing welcomed the Panel and made sure to say that they looked forward to open collaboration. But the Panel says that in fact, Boeing answered questions rather as if the evaluation were an audit or a deposition, and asked for no input of any kind. (Sec. 2.6, paras. 12-13; sec. 3.2, para. 1)

So I have to ask, Did Boeing expect to learn anything from this evaluation? Or was the intent simply to get through it as fast as possible, with as few findings as possible? Because clearly, if you approach the whole exercise in a defensive frame of mind, you leave open fewer chances to learn and improve from the experience. 

Also interesting: the Board of Directors emphasized that they use safety-related performance metrics "when determining both Annual Incentive Pay and Long-Term Incentives." These metrics include, for example, "the requirement for executives to complete Boeing's Safety Management System training." This statement was intended to demonstrate Boeing's commitment to safety. (Sec. 3.7, paras. 5, 9, and 12)

The problem is, I think it demonstrates the reverse. Safety metrics in the bonus program? No! On the contrary, safety should be more important than any bonus program! Ironically, when you pay people for something, you cheapen it. At that point people start weighing one part of the bonus against another: Let's see, if I'm willing to give up a few dollars on safety, we can sell a lot more planes and by the end of the year the difference will more than make up for what I lost. Dollarizing the safety program is irresponsible if not worse. Safety should be non-negotiable, and paying people for it makes it negotiable. (I discuss this point in more detail in this post here.)

On the other hand, I understand why the Board of Directors would take this approach. To the man with a hammer, every problem looks like a nail. And it does seem like, in the last couple of decades, money is the hammer that Boeing's management has learned how to use. 



It's a long report. But I think it explains why Boeing has gotten into its present straits. From my point of view, the fundamental problems are all around system implementation. Boeing tried to create a new system, but went for the quick-n-easy approach rather than making sure the new system was fully implemented and integrated at all levels in the organization. As a result, people don't know what to do! Even people who want to do the right thing—and I firmly believe that this includes nearly everyone, nearly all the time—don't know how to do the right thing so that errors get caught, followed up, and fixed … and so that they themselves don't get in trouble for finding those errors in the first place.

Too much system can be as much a problem as not enough system. There's a balance and it always has to be pragmatic. I may have said this once or twice before now. 

__________

* I have participated audits that were meant as gap analyses, for organizations that wanted ISO 9001 certification and knew they weren't ready yet; and the results of those were often far worse than this one. But it was no surprise because the organizations knew in advance they had a lot of work to do.    

                

Thursday, March 14, 2024

The news just keeps coming!

I thought I was done writing about Boeing's current Quality problems, but the news just keeps coming and coming. Some of the stories simply confirm what we've already said about Boeing's current Quality culture; other stories talk about legal issues, and have less to do with Quality strictly understood. But one way or another, there continue to be a lot of them.

Here's a quick sampling of recent stories that I've found around the Internet:

It's an exciting time.

Ziad Ojakli, Boeing EVP
But the story I want to write about is a different one. In some ways it is smaller and quieter than the ones I just listed, but it sheds a helpful light on one of the least glamorous—but most critical!—of all the Quality disciplines. Yes, I'm talking about records control, and about how Boeing's records control system seems to have failed them at the worst possible moment.

The basic story is told by the Seattle Times here, and Associated Press chimes in here for corroboration. Briefly, it all started with the investigation into Alaska Airlines flight 1282, when a door plug blew out while the plane was in the air. The investigation revealed that four bolts were missing which were supposed to hold the door plug in place.

Why were the four bolts missing?

They had been removed to facilitate earlier rework.

Why was there rework?

There was damage to five rivets which had to be repaired. The procedure to repair those rivets required that the door plug be removed temporarily. Then after the repair the door plug was replaced.

Why weren't the four bolts replaced when the door plug was replaced?

… good question. Here the trail runs cold. The logical thing would be to ask the person who did the repair, but we don't know who that was.

Wait, what?? How can we not know who did the repair? Surely that information was captured as part of the repair documentation!

You would think so. But up till now Boeing has been unable to provide that documentation. And last Friday, Ziad Ojakli, Boeing executive vice president and the company’s chief government lobbyist, sent a letter to Sen. Maria Cantwell of the Senate Commerce Committee, saying, "We have looked extensively and have not found any such documentation." He added as a "working hypothesis: that the documents required by our processes were not created when the door plug was opened."

Let me repeat that, just to be clear:

  • Boeing's procedures require complete documentation of any rework, whenever rework is done. (So far, so good.)
  • But now they can't find the documentation for rework that was done two months ago.
  • The company's executive management is willing to tell a Senate Committee that he thinks maybe the documentation was never generated.  

This is terrifying.

To be more exact, there are several possible explanations for this turn of events, and every single one of them is terrifying!

One possibility is that the documentation really wasn't generated for this particular rework. 

But in that case, what else are they doing that hasn't been documented? How could you ever know? (Hint: you couldn't.) And if you don't know what work has been done on an airplane, why would you ever be willing to fly on one again?

Another possibility is that the documentation was generated, but Boeing can't find it.

This raises the same fears. If you can't find your documentation, it might as well not exist. At that point you are totally unable to use the documentation: for example, to monitor trends, or to connect the dots between one failure and another. You can't do anything proactive, and you can't even do much that's reactive. All you can do is wait for the next plane to fall out of the sky.
And of course a third possibility is that Boeing is brazenly lying to a Senate Committee.

In some ways, I almost hope this last one is the answer. I would rather that a company like Boeing be competent, even while doing something villainous, than that they succumb to floundering ineptitude. At the very least, a competent villain is more likely to build planes that keep flying.

But if you make the conscious decision to lie to the Senate, it's because you are hiding something really bad. Nobody does that on a whim. And so, once again, I start to worry about "What else don't we know?"

Yes, of course there are other possibilities, but mostly I think they add filigree details to the ones I have already sketched out. Maybe the documentation was created, but then the guy who did the work snuck into the records system and destroyed it afterwards so he wouldn't get in trouble when flight 1282 lost its door plug in such a dramatic way. Or maybe his friend did it on his behalf. And naturally it's easy to understand why this guy would be afraid of being in the spotlight nationwide. What's not easy to understand—what is, in fact, flatly inexcusable—is why any company as big as Boeing would tolerate a document control system that could be subverted so easily by a single bad actor.

You keep documentation for a reason. And even when the documentation embarrasses you, it's better to provide it (and own up in public to your mistakes) than to hide it (and leave everyone wondering whether things are even worse than they really are).

When I first started writing about Boeing's troubles (back in January) I tried to put those troubles in the best possible light by pointing out how few failures there have been (as a fraction of the total number of flights in a year) and by explaining that the whole point of a Quality Management System is to help you handle failures gracefully.

But document and records control is the single most basic element of any QMS. If Boeing never generated (or cannot find) rework documentation for a recent job, then their QMS fundamentally isn't working.

There is no way to tell this particular story so that it sounds good.

Photo from the National Transportation Safety Board


          

Thursday, March 7, 2024

Problem-solving is like breathing!

After a month and a half (or more!) of articles about corporate cultures and poor Quality choices, maybe I can afford to take a break and post something different. Yesterday I saw a delightful video by Jamie Flinchbaugh over on his JFlinch blog, about how problem-solving is like breathing.

Breathing?

Yes, exactly!

His point is that problem-solving is something we do all the time. And yes, we can learn to do it better. But that doesn't automatically mean we will always do it better when we aren't thinking about it intentionally.

On the other hand, if we practice being intentional about our problem-solving (or our breathing!) then yes, over time it can pay dividends in our daily lives as well. 

Here, listen to Jamie explain the point:


Robert Pirsig makes a similar point in
Zen and the Art of Motorcycle Maintenance, after cataloguing a long list of "gumption traps" that prevent someone from doing good work. (He presents these in terms of doing mechanical work on your motorcycle, but in fact they apply to any kind of work you can think of.)

Some could ask, ‘Well, if I get around all those gumption traps, then will I have the thing licked?’

The answer, of course, is no, you still haven’t got anything licked. You’ve got to live right too. It’s the way you live that predisposes you to avoid the traps and see the right facts. You want to know how to paint a perfect painting? It’s easy. Make yourself perfect and then just paint naturally. That’s the way all the experts do it. The making of a painting or the fixing of a motorcycle isn’t separate from the rest of your existence. If you’re a sloppy thinker the six days of the week you aren’t working on your machine, what trap avoidances, what gimmicks, can make you all of a sudden sharp on the seventh? It all goes together.

But if you’re a sloppy thinker six days a week and you really try to be sharp on the seventh, then maybe the next six days aren’t going to be quite as sloppy as the preceding six. What I’m trying to come up with on these gumption traps I guess, is shortcuts to living right.

The real cycle you’re working on is a cycle called yourself.*

__________

* Robert Pirsig, Zen and the Art of Motorcycle Maintenance (New York: William Morrow, 1974, 1999), pp. 324-325.   

                

Five laws of administration

It's the last week of the year, so let's end on a light note. Here are five general principles that I've picked up from working ...