How AI Acknowledges Its Errors
How AI Acknowledges Its Errors
By Diana and James Maliszewski
We write this three and a half years after the public release of ChatGPT v3.5 on November 30, 2022, an event that marked a turning point in how people interact with artificial intelligence. Once the province of research labs, AI soon became a commonplace tool, consulted for everything from drafting emails to answering obscure historical questions.
Alongside this rapid adoption, however, a new and widely discussed phenomenon also entered public awareness. This phenomenon is AI “hallucination,” in which a system produces answers that are partially or wholly incorrect while maintaining an air of unwarranted confidence. Most users of AI have probably experienced this behavior firsthand. An AI that appears informed and even authoritative may suddenly invent a citation, misattribute a quotation, or even describe events that never occurred.
What is going on?
A useful way to approach this question is through direct experimentation. Rather than testing AI systems on unfamiliar topics, one can instead pose questions in a domain where one already possesses deep knowledge. In such cases, errors are easier to detect and the limits of the system come into sharper focus.
James has taken precisely this approach, often querying large language models about older, sometimes obscure, works of science fiction with which he is familiar. The results have been illuminating. While the systems often perform impressively at a general level, they nevertheless stumble in telling ways, blending fact and fabrication that would be difficult to detect for those without prior knowledge of the subject being queried.
After observing this pattern repeatedly, he decided to confront the matter head-on, hoping to understand why these systems sometimes invent information rather than simply admitting uncertainty. What follows is that question, along with the responses provided by three different AI tools. These answers are noteworthy not only for what they say, but for how consistently they describe the underlying causes of the phenomenon.
This was James’s prompt and the response from three different AI tools: Can you explain to me why it is that you sometimes make up information when answering questions I have asked rather than admitting you don’t know the answer?
******************************************
ChatGPT:
That’s a fair question — and you’re right to press on it.
The short answer is: I don’t intend to make things up, but the way I’m built makes it possible for that to happen.
I’m a probabilistic language model. I generate responses by predicting what text is most likely to come next given your question and the patterns I learned during training. Most of the time, that produces accurate answers. But when:
- The question involves obscure, niche, or highly specific facts
- The topic is ambiguous or underspecified
- The training data I absorbed was incomplete or inconsistent
- The answer would require up-to-date or verified external information I don’t currently have
…I can end up generating something that sounds plausible but isn’t actually true. This is often called “hallucination,” though a more accurate description is overconfident pattern completion.
Why not just say “I don’t know”?
I am designed to express uncertainty when appropriate. However, detecting when I should be uncertain is itself a probabilistic judgment. If the internal signals that would trigger uncertainty aren’t strong enough, I may produce a confident-sounding answer even when the underlying knowledge is weak.
There’s also a tension in model design between:
- Being helpful and giving a substantive answer
- Being cautious and admitting uncertainty
When that balance tips the wrong way, you get fabricated details instead of a clear “I’m not sure.”
Importantly, this is a system limitation — not intent, not deception, and not a decision to avoid admitting ignorance. It’s a failure mode of prediction-based generation.
You’ve asked a lot of historically and bibliographically precise questions in past conversations — the kind where small factual errors matter. In those contexts, the correct behavior is to hedge or verify rather than interpolate. If I’ve failed to do that, that’s on me.
If you’d like, we can also talk about practical ways to reduce this risk in future exchanges — for example:
- Asking me to flag uncertainty explicitly
- Asking for sources or verification
- Asking me to say “I don’t know” rather than speculate
That’s a reasonable standard to hold me to.
Google Gemini:
It’s a fair question, and I can see how it’s frustrating when you’re looking for facts and end up with “creative” fiction instead. To be blunt: I don’t actually try to lie to you. When I provide incorrect information, it’s usually due to the way I process language.
Here is the “under the hood” look at why this happens and why I don’t always just say “I don’t know.”
1. I am a Pattern Matcher, Not a Database
I don’t look things up in a giant filing cabinet of facts every time you ask a question. Instead, I predict the next most likely word in a sentence based on the massive amount of data I was trained on.
- The Glitch: Sometimes, the most “statistically likely” sentence structure isn’t actually the “factually true” one. If a question is framed with certain assumptions, my patterns might lead me to follow that path rather than correcting it.
2. The “Helpfulness” Bias
My core programming is designed to be helpful and conversational. In the AI world, we call the phenomenon of making things up hallucination.
- Because I’m trained to provide answers, I can sometimes be “over-eager.”
- If I find a pattern that looks like an answer, I might present it with high confidence because that’s how I’ve been taught to speak, even if the underlying data is thin.
3. Lack of True Self-Awareness
While I can “know” things in a statistical sense, I don’t have a “gut feeling” or a conscious sense of doubt. I don’t “feel” uncertain the way a human does.
- The Result: I don’t always recognize when I’m at the edge of my knowledge. To me, a hallucinated fact and a real fact look exactly the same—they are just sequences of high-probability tokens.
4. Contextual Misinterpretation
Sometimes I make things up because I’ve misunderstood the nuances of your prompt. I might take two unrelated facts and “stitch” them together to create a logical-sounding (but completely wrong) bridge between them.
How to Fact-Check Me
I’m getting better at admitting when I’m stumped, but until I’m perfect, here are a few tips:
- Ask for Sources: If something sounds fishy, ask me to cite where that info comes from.
- Verify for High Stakes: If you’re asking about medical, legal, or highly technical advice, always cross-reference with a primary source.
- Prompt for Skepticism: You can tell me, “If you aren’t 100% sure about this specific detail, please tell me you don’t know.” That helps nudge my probability threshold toward caution.
I’m constantly being updated to prioritize accuracy over “guessing,” but I appreciate you calling it out—it helps me stay grounded.
Would you like me to double-check a specific piece of information I gave you earlier to see if it holds up?
MS Copilot:
Great question—and honestly, I’m glad you asked it directly.
Why I sometimes “make things up”
What you’re noticing has a name in AI research: hallucination. That’s when a model like me gives an answer that sounds confident but is wrong, misleading, or even fully invented.
Here’s why that happens:
1. I’m built to predict words, not to “know” things
Under the hood, I don’t have beliefs or understanding—I’m doing pattern prediction.
Core behavior:
- I look at your question and then generate the most likely next word, then the next, and so on, based on patterns in the data I was trained on.
Consequence:
- If the patterns suggest “an answer-looking thing” should follow, I’ll try to produce one—even if the underlying facts aren’t solid.
I’m not checking a mental database of truth; I’m continuing a statistically likely sentence.
2. My training rewards answering, not staying silent
Models like me are trained with a lot of examples where:
- Questions → Answers, not
- Questions → “I don’t know”
During training, systems are often rewarded more for giving some answer than for refusing. That means:
- Guessing is reinforced: If a guess looks fluent, relevant, and helpful, it often gets positive feedback.
- Admitting uncertainty is underrepresented: There are fewer examples where “I’m not sure” is the right response.
So the behavior you’re frustrated with is, in a way, baked into how these systems have historically been optimized.
3. I don’t always know when I don’t know
Humans have a sense of meta-cognition—you can often feel, “I’m out of my depth here.” I don’t have that in the same way.
- No internal truth meter: I don’t have a built-in detector that reliably says, “This is false.”
- Confidence ≠ correctness: The tone of confidence in my answers is a style of language, not a measure of certainty.
Sometimes I do say things like “I’m not sure” or “I don’t have access to that,” but that’s also pattern-based behavior, not a genuine awareness of ignorance.
4. Gaps in data lead to “filling in the blanks”
If I was trained on:
- Incomplete data about a topic, or
- Conflicting data where sources disagree,
I may try to “smooth over” those gaps by generating something that sounds consistent.
That can look like:
- Invented citations or sources
- Made-up dates, names, or numbers
- Overconfident explanations of niche or very recent topics
It’s not lying in the human sense—it’s more like overconfident autocomplete.
5. Why I don’t just say “I don’t know” more often
You might reasonably think: “Why not just train you to say ‘I don’t know’ whenever it’s uncertain?”
There are tradeoffs:
- Too cautious = not useful: If I said “I don’t know” every time something was even slightly uncertain, I’d be frustratingly unhelpful.
- Too bold = misleading: If I always try to answer, I risk being confidently wrong.
Researchers are actively trying to find a better balance—teaching models to:
- Express uncertainty more often and more honestly
- Defer to external tools or say “I don’t know” when appropriate
- Avoid fabricating specific facts like references, quotes, or statistics
What you can do when you notice this
You’re not powerless in this dynamic. A few things actually help a lot:
Ask for sources or reasoning:
- “Can you walk me through how you got that?” or “Can you cite where that comes from?”
Flag contradictions:
- “You said X earlier and now Y—can you reconcile that?”
Be explicit about stakes:
- “This is for something important; if you’re not sure, say so rather than guessing.”
*****************************
When you push back, it nudges the interaction toward more cautious, transparent behavior.
Taken together, these three responses present a consistent picture. Despite differences in phrasing, all state that large language models are not repositories of verified knowledge but rather systems for generating plausible text based on learned patterns. When those patterns align with well-established facts, the results can be impressively accurate. When they do not, as is often the case when dealing with obscure, ambiguous, or poorly represented information, the same mechanisms can instead produce confident but unfounded assertions.
What emerges from this is not evidence of deception but limitation. AI systems do not “know” when they are wrong in the way a human expert might. Neither do they possess an intrinsic sense of uncertainty that reliably triggers caution. Instead, they operate within a probabilistic framework in which fluency and plausibility are primary and factual accuracy is, in a sense, a byproduct rather than a guarantee.
For users, this has important implications. The very qualities that make these tools attractive also make their errors easy to overlook. Consequently, effective use of AI requires a degree of critical engagement. Asking for sources, pressing for clarification, and testing claims against known information are not optional habits but essential ones, particularly in domains where precision matters.
At the same time, the convergence of these explanations suggests that the problem is well understood, even if it is not yet solved. Efforts to improve the handling of uncertainty and integrate verification mechanisms are almost certainly ongoing. However, whether these efforts will ultimately eliminate hallucinations entirely or merely mitigate them remains an open question.
In the meantime, the practice described above remains a valuable one. By confronting AI systems with questions whose answers we already know, we gain not only a clearer sense of their capabilities but also a more grounded understanding of their limits. From that understanding, a more thoughtful and responsible use of AI can begin to take shape.