Pragmatic Quality Blog: Does AI have a Quality problem?

We've all seen articles about the incredible power and potential of Artificial Intelligence (AI). Whole industries are being restructured to make use of AI's capabilities. One article from last year—that I chose almost at random—lists the use cases for Large Lanuage Models (LLMs) as follows:

Coding: "LLMs are employed in coding tasks, where they assist developers by generating code snippets or providing explanations for programming concepts."
Content generation: "They excel in creative writing and automated content creation. LLMs can produce human-like text for various purposes, from generating news articles to crafting marketing copy." [Gosh, are they any good at writing niche products like Quality blogs? 😀 Just kidding, of course!]
Content summarization: "LLMs excel in summarizing lengthy text content, extracting key information, and providing concise summaries."
Language translation: "LLMs have a pivotal role in machine translation. They can break down language barriers by providing more accurate and context-aware translations between languages."
Information retrieval: "LLMs are indispensable for information retrieval tasks. They can swiftly sift through extensive text corpora to retrieve relevant information, making them vital for search engines and recommendation systems."

And so on. The article lists eight more use cases before summarizing with a list of half a dozen general benefits of LLMs. (I found myself wanting to ask if the author has an LLM in the family, perhaps as a favorite cousin or an in-law.) In short, LLMs can do quite a lot.

But LLMs hallucinate!

We are discovering, though, that it is not safe to rely on LLMs for an accurate description of what is out there. When LLMs summarize content or retrieve information, sometimes they report things that aren't true. The first time I saw such a story, it was in this post from LinkedIn back in 2023, where Marcus Hutchins posted a conversation he had with the Bing AI chatbot. The bot claimed that it was still 2022, insisting "I know the date because I have access to the Internet and the World Clock"—even though it was verifiably already 2023!

Then more stories started rolling in. To my mind the most dramatic has been the recent legal case SHAHID v. ESAAM (2025), Docket No: A25A0196, decided on June 30, 2025 by the Court of Appeals of Georgia. The summary description of this case makes for delightful reading, and I enclose a selection below in an extended footnote.* But the gist is that one party's pleading must have been generated by an LLM tool. No human lawyer could have written it. The pleading rested almost entirely on bogus case law: either cases that never happened, or cases that had no relation to the point at stake. This is the kind of mistake that junior paralegals get fired for. Even worse, the initial trial court accepted it without blinking. The bogus citations were caught only by the Court of Appeals.

So employing LLMs comes with a risk. You can't just blindly trust whatever they tell you without cross-checking it, because they fabricate content so effortlessly. Looking back at that list of use cases at the top of this post, I have to qualify the claim that you can use them in writing or summarizing: maybe LLMs can suggest an interesting idea you didn't think of before, but they can't do your work for you. Some people, though (like Sufyan Esaam's attorney) want to use them for just that.

It's a problem.

What about Quality?

But is it a Quality problem? Here the answer is not so clear, because it depends how exactly you define Quality. You remember that I prefer to define Quality as "getting what you want"; and in that sense—especially if "you" means the end-user of the AI tools—then AI hallucinations constitute a big Quality problem. When AI hallucinates a false answer to my question, I'm not getting what I want.

But there is another definition, which says that "Quality is conformance to requirements." And with that definition the situation is rather different ... because the LLM programs are doing exactly what they have been told to do!

Jason Bell of Digitalis.io made this argument in a recent LinkedIn post. The point is that the LLM tool is not programmed to see what's really there. It is not programmed to perceive reality, and it is not programmed to tell the truth. Its only programming is to say something that sounds good, subject to certain parameters that define what it takes for something to "sound good." But perceiving reality and telling the truth are never part of that definition, because AI has no mechanism or equipment to allow it to carry out those tasks.

In a sense, then, the problem is not with AI itself, but with user expectations. It's like if I use a hammer to comb my hair: the results are pretty sketchy, but it's not the hammer's fault.

Of course I have no idea how long AI will be a big deal, or how much of an impact it will have on our work and our lives. But as long as it is here, it will be useful for us to be clear on its capabilities and limitations, so that we can distinguish reality from science fiction. In its current form, AI has no cognitive component, and therefore cannot observe reality or distinguish truth from falsehood. But it is very good at sifting through piles of words according to defined rules.

And honestly? That's enough for now. Let's use it for the things it really can do, and not try to make it comb our hair or understand the world. After all, if AI ever develops a cognitive component, that addition will doubtless bring new problems of its own.

HAL 9000, of course.

__________

* [Emphasis is mine, in all cases.]

After the trial court entered a final judgment and decree of divorce, Nimat Shahid (“Wife”) filed a petition to reopen the case and set aside the final judgment, arguing that service by publication was improper. The trial court denied the motion, using an order that relied upon non-existent case law. For the reasons discussed below, we vacate the order and remand for the trial court to hold a new hearing on Wife's petition. We also levy a frivolous motion penalty against Diana Lynch, the attorney for Appellee Sufyan Esaam (“Husband”)....

Wife points out in her brief that the trial court relied on two fictitious cases in its order denying her petition, and she argues that the order is therefore, “void on its face.”

In his Appellee's Brief, Husband does not respond to Wife's assertion that the trial court's order relied on bogus case law. Husband's attorney, Diana Lynch, relies on four cases in this division, two of which appear to be fictitious, possibly “hallucinations” made up by generative-artificial intelligence (“AI”), and the other two have nothing to do with the proposition stated in the Brief.

Undeterred by Wife's argument that the order (which appears to have been prepared by Husband's attorney, Diana Lynch) is “void on its face” because it relies on two non-existent cases, Husband cites to 11 additional cites in response that are either hallucinated or have nothing to do with the propositions for which they are cited. Appellee's Brief further adds insult to injury by requesting “Attorney's Fees on Appeal” and supports this “request” with one of the new hallucinated cases.

We are troubled by the citation of bogus cases in the trial court's order. As the reviewing court, we make no findings of fact as to how this impropriety occurred, observing only that the order purports to have been prepared by Husband's attorney, Diana Lynch. We further note that Lynch had cited the two fictitious cases that made it into the trial court's order in Husband's response to the petition to reopen, and she cited additional fake cases both in that Response and in the Appellee's Brief filed in this Court.

As noted above, the irregularities in these filings suggest that they were drafted using generative AI....

Pragmatic Quality Blog

Thursday, July 31, 2025

Does AI have a Quality problem?

But LLMs hallucinate!

What about Quality?

No comments:

Post a Comment

Five laws of administration