AI Hallucination Explained: What It Is and How to Detect It in Your Apps

AI tutorial - IT technology blog
AI tutorial - IT technology blog

The Problem You’ll Run Into

Picture this: you’ve just shipped a chatbot that answers technical questions using an LLM. Your users are happy — until someone reports that the AI confidently described Linux flags that don’t exist, cited a Stack Overflow thread with a fake URL, or listed a Python method that was never in the standard library. No warning, no hedging. Just a fluent, authoritative answer that happened to be fabricated.

This is AI hallucination. And it’s one of the first real problems you’ll hit when building with large language models. The tricky part isn’t that the AI is wrong — it’s that it’s wrong with full confidence.

This isn’t an edge case that only bites beginners. It shows up in production, consistently, and it tends to be worst where accuracy matters most: technical documentation, medical information, legal summaries, or anything tied to specific version numbers and command syntax.

What Is AI Hallucination?

An AI hallucination happens when a language model generates text that sounds plausible but is factually wrong, fabricated, or not grounded in any real data. The model isn’t lying. It’s predicting the most statistically likely next token based on patterns in its training data — with no built-in fact-checker anywhere in that process.

Common Types of Hallucination

  • Factual hallucination: The model states something false as fact (“Python 3.12 added a native switch statement”).
  • Citation hallucination: It cites papers, URLs, or documentation that don’t exist.
  • Instruction hallucination: It gives shell commands or API calls with wrong flags, method names, or parameters.
  • Context hallucination: It invents details that were never in your input prompt or the document you provided.

Why Does It Happen?

LLMs are trained to predict text — not to query a database of facts. When a model hits the edge of its knowledge, it fills in the gaps with statistically plausible text. The result looks syntactically correct. The facts, though, can be entirely wrong.

The problem compounds when:

  • The topic is niche or highly specific
  • The training data is outdated (knowledge cutoff)
  • The prompt is open-ended, giving the model room to speculate

Four Ways to Catch Hallucinations in Your Code

These techniques range from cheap-to-implement to more involved. Start with whichever fits your current bottleneck.

1. Force Structured Output and Validate It

The simplest defense: make the model return structured JSON, then validate it against a known schema. If the model hallucinates a field or an impossible value, your validation layer catches it before it reaches the user.

import anthropic
import json

client = anthropic.Anthropic()

def ask_with_structured_output(question: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"""Answer this question and return ONLY valid JSON.
Schema: {{"answer": "string", "confidence": "high|medium|low", "sources": ["list of sources if known"]}}

Question: {question}"""
        }]
    )

    raw = response.content[0].text.strip()
    try:
        data = json.loads(raw)
        assert "answer" in data
        assert data.get("confidence") in ("high", "medium", "low")
        return data
    except (json.JSONDecodeError, AssertionError) as e:
        return {"error": f"Invalid response: {e}", "raw": raw}

result = ask_with_structured_output("What flags does the Linux 'ls' command use to show hidden files?")
print(result)

When the model returns "confidence": "low", treat that as a signal to show a disclaimer or route the query to a human reviewer.

2. Ground the Model in Real Source Material

Don’t let the model answer from memory. Pass in the actual documentation or source content, then instruct it to answer only from that. This is the core idea behind RAG — Retrieval-Augmented Generation. Of all the techniques here, it delivers the biggest reliability boost for the least implementation effort.

def ask_with_grounding(question: str, context: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="Answer ONLY based on the provided context. If the answer is not in the context, say 'Not found in provided context.'",
        messages=[{
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}"
        }]
    )
    return response.content[0].text

# Example: grounding from an actual man page snippet
man_page_snippet = """
ls - list directory contents
  -a  do not ignore entries starting with .
  -l  use a long listing format
  -h  with -l, print sizes in human readable format
"""

answer = ask_with_grounding("How do I show hidden files with ls?", man_page_snippet)
print(answer)

The model now only works with what you give it. If the answer isn’t in the context, it says so — rather than inventing one.

3. Run a Self-Consistency Check

Send the same question several times and compare the answers. Consistent results across runs suggest the model is recalling something it knows well. Wide variation is a red flag — the model is guessing, not drawing on solid knowledge.

def consistency_check(question: str, runs: int = 3) -> list[str]:
    answers = []
    for _ in range(runs):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=256,
            messages=[{"role": "user", "content": question}]
        )
        answers.append(response.content[0].text.strip())
    return answers

question = "What Python version introduced f-strings?"
results = consistency_check(question)

for i, ans in enumerate(results, 1):
    print(f"Run {i}: {ans[:100]}")

# Flag inconsistency as a hallucination risk
unique_answers = set(a[:60] for a in results)
if len(unique_answers) > 1:
    print("\n[WARNING] Inconsistent answers — treat with caution")

4. Use a Verification Prompt

After getting an initial answer, send a second request asking the model to critique its own response. Think of it as a built-in review pass. It won’t catch everything, but it surfaces shaky claims before they reach your users.

def verify_answer(question: str, initial_answer: str) -> dict:
    verification_prompt = f"""
Original question: {question}
Proposed answer: {initial_answer}

Review this answer carefully:
1. Is it factually accurate based on your knowledge?
2. Are there any specific claims that might be uncertain?
3. Rate your confidence: high / medium / low

Respond as JSON: {{"accurate": true/false, "uncertain_claims": [], "confidence": "high|medium|low"}}
"""
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        messages=[{"role": "user", "content": verification_prompt}]
    )
    try:
        return json.loads(response.content[0].text.strip())
    except json.JSONDecodeError:
        return {"error": "Could not parse verification response"}

Putting It Together in a Real Pipeline

In production, I layer these in order of cost and impact:

  1. Grounding first: Always inject relevant documentation or context into the prompt. Never let the model answer factual questions from memory alone.
  2. Structured output: Use JSON schemas so you can validate what comes back programmatically.
  3. Confidence routing: If confidence is low or consistency is poor across runs, route to a fallback — a simpler rule-based answer, a “I don’t have reliable information on this” message, or a human review queue.
  4. Log everything: Log all model inputs and outputs. When a user reports a wrong answer, you’ll need the full trace to understand what went wrong.

The mental shift that matters most: treat LLM output like user input. Don’t trust it blindly. Validate it, ground it in real data, and build confidence checks into your application layer. The model has no idea when it’s hallucinating — that’s your code’s job.

Where to Start

Add grounding first. Pass in actual documentation or database content alongside each question. In domain-specific apps — a docs assistant, a support bot, a version-aware CLI helper — that single change can cut hallucination frequency by more than half.

Once grounding is in place, add structured output validation so your application can detect unexpected or malformed responses. Save verification prompts for high-stakes queries where accuracy is critical and a bit of extra latency is acceptable.

Hallucination is baked into how language models work — it’s not a bug that will be patched away in the next release. But with these techniques, you can build AI features that stay reliable even when the underlying model occasionally gets it wrong.

Share: