The Zero-Hallucination Paradox: Why AI’s Knowledge Foundations Rest on Quicksand

A multi-layered exploration of why “near-zero hallucinations” is not just marketing hyperbole, but a mathematical and philosophical impossibility

December 07, 2025

My Favorite Dialogue: The Circular Trap

Let me share my favorite dialogue from recent months. It reveals everything wrong with current AI marketing in just a few exchanges:

“How do you reduce hallucinations in your AI agents?”

“We use knowledge graphs! Our solution is almost zero hallucinations — 95% accuracy!”

“That’s impressive. How do you build your knowledge graph?”

“Uh… we use LLMs.”

Long pause. Uncomfortable silence.

This circular dependency breaks everything. When you use LLMs to build knowledge graphs meant to prevent LLM hallucinations, you haven’t solved the problem — you’ve just propagated it deeper into your architecture. You’ve taken hallucination errors and embedded them into your foundational data structures.

The marketing promise is seductive: deploy a knowledge graph, add an ontology, layer in some reasoning capabilities, and hallucinations vanish. Enterprise buyers nod approvingly at claims of “near-zero hallucination rates.” But beneath this confident surface lies a philosophical quicksand that the AI industry has largely ignored — a circular dependency so fundamental it undermines the entire edifice.

And what about that remaining 5%? The marketing stays silent. In sensitive domains where people’s lives depend on AI decisions, that 5% isn’t acceptable. You can’t tell a patient, “Well, there’s only a 5% chance this diagnosis is completely fabricated.”

The Mathematical Proof: Zero Hallucinations Are Impossible

Before we even address the circular dependency, we need to confront a harder truth: recent academic research demolishes the “zero hallucination” claim with formal mathematical proofs.

A 2024 paper by Xu et al. using computability theory demonstrates that LLMs cannot learn all computable functions — hallucinations are structurally inevitable. A separate proof by Banerjee et al. invokes Gödel’s First Incompleteness Theorem to show that every stage of LLM processing has non-zero hallucination probability, making elimination “impossible through architectural improvements, dataset enhancements, or fact-checking mechanisms.”

This isn’t an engineering challenge to be overcome with better technology. It’s a fundamental limitation proven through formal logic. The question is not “can we achieve zero hallucinations?” but rather “how do we build systems that honestly manage the inherent uncertainty?”

Layer 1: The LLM-Knowledge Graph Circular Dependency

When you use LLMs to build knowledge graphs, you encounter three persistent failure modes:

Accumulated Error Propagation: Traditional knowledge graph construction suffered from cumulative errors cascading through extraction pipelines. LLM-driven approaches inherit this problem while adding new failure modes.
Domain Knowledge Gaps: LLMs lack specialized expertise required for technical domains. When they construct ontologies or extract entities in unfamiliar territory, they hallucinate relationships that seem plausible but are factually wrong.
Hallucinated Triples: The most insidious problem — LLMs invent relationships wholesale, fabricating connections that never existed in source data. These hallucinated triples become embedded in the knowledge graph, poisoning downstream reasoning.

A comprehensive 2025 survey on LLM-empowered knowledge graph construction confirms these issues persist across all current architectures. The circular logic is inescapable: you’re using unreliable tools to build reliability infrastructure.

The empirical evidence is sobering. Microsoft’s GraphRAG achieves 70–80% win rates over naive RAG on comprehensiveness metrics. But more rigorous 2025 evaluation found GraphRAG improves multi-hop reasoning by only 4.5% while introducing 2.3× higher latency. The researchers concluded that “existing GraphRAG research is overly optimistic about performance gains.”

Even Microsoft, with enormous resources, couldn’t deliver on the promise. Instead of the 80% improvement they marketed, independent testing showed 4–5% actual gains. The heavily increased computation costs make this marginal improvement economically questionable.

Even the best production systems show hallucination rates of 1.5–3% under controlled conditions — better than the industry average of 10–27%, but nowhere near zero. And OpenAI’s reasoning models actually hallucinate more: o3 hallucinates 33% of the time on person-specific questions, while o4-mini reaches 48%.

So knowledge graphs alone are clearly not enough. This realization led the industry to the next layer: ontology.

Layer 2: Why Graphs Without Ontology Fail

I wrote an article explaining why you cannot just build knowledge graphs “as is.” A knowledge graph without formal ontology is merely a collection of nodes and edges — semantically ambiguous and incapable of sophisticated reasoning.

Consider the term “Jaguar” appearing in a graph: without ontological grounding, the system cannot distinguish between the animal, car brand, or programming language. This ambiguity extends to relationships: does the mention of a drug and an indication represent treatment or causation? Drugs can treat headaches but also cause them.

Ontology provides what raw graphs cannot:

Formal semantics: Precise logical definitions via Description Logic rather than ambiguous interpretation
Reasoning and inference: Deriving implicit knowledge through transitive relationships, class hierarchies, and logical rules
Consistency checking: Automatic detection of contradictions impossible in raw graphs
Constraint validation: Enforcement via axioms (e.g., “a Person cannot be both alive and deceased”)
Interoperability: Standardized vocabularies through OWL, RDF, and RDFS

Together with knowledge graphs and reasoning, we need ontology as construction rules and drivers for building knowledge graphs. Ontology allows us to reason about knowledge and maybe eliminate some hallucinated triples. Ontology says what exists — it’s a big move forward.

Research on large-scale knowledge graphs with 28+ billion triples finds they routinely contain logical inconsistencies that make “reasoning over these KGs limited and the knowledge formally useless.” When LLMs construct graphs without predefined schemas, they produce vocabulary variations for identical relationships (“directed_by” versus “director”), undermining coherence.

Layer 3: The Ontology-Epistemology Gap

But here we encounter the next layer of the problem: how do we build ontology?

Sometimes we are lucky and have industry-wide ontologies — well-defined by humans, thoroughly vetted. But when we start using LLMs to define ontologies and work as ontologists, we’re back in yet another circle of dependency. This can maybe be broken by humans, but you need humans in the loop to filter out the hallucinations.

Who validates the ontology? Is ontology enough to verify our knowledge? My answer is actually no.

This brings us to a fundamental philosophical distinction that the AI industry has ignored: the separation between ontology and epistemology.

Ontology defines what exists and the structure of knowledge. But epistemology asks the deeper question: how do we know that we know? Is it really knowledge, or is it our beliefs? What is the actual source of knowledge? Because if the source is yet another hallucinated LLM, it’s not reliable.

Philosophy identified this challenge millennia ago as Agrippa’s Trilemma (also called Münchhausen’s Trilemma, after the Baron who allegedly pulled himself from a swamp by his own hair). Any attempt to justify knowledge faces exactly three options, all problematic:

Infinite regress: Every justification requires further justification, ad infinitum
Circular reasoning: The justification eventually references itself
Dogmatism: Resting on axioms asserted without defense

AI systems building knowledge representations face this trilemma directly. When a domain expert defines an ontology, what validates their expertise? When an LLM extracts entities and relationships, what confirms those extractions match reality? When consistency checkers verify logical coherence, what ensures coherent systems correspond to truth?

The Epistemology Crisis Since the Internet

We’re experiencing an epistemology crisis since the internet arrived. We could rely on information to some extent that was written in Britannica — it had editorial oversight, expert review, institutional backing. But now we have Wikipedia, where practically everybody could contribute.

How do you know that a Wikipedia article is talking about truth? What mechanisms verify the knowledge? This problem cascades throughout the internet and now into AI systems trained on that data.

Current approaches adopt pragmatic compromises. Competency Question-driven design validates ontologies against their ability to answer predefined questions — but who validates the questions? Human-in-the-loop approaches rely on subject matter experts — but experts can be wrong, and inter-expert disagreement is common. Automated reasoners check consistency — but consistency doesn’t equal correctness.

Recent experiments show LLMs performing “on par with human users” at ontology evaluation (0.66 macro-F1), yet this merely shifts the validation burden to systems that themselves hallucinate.

The meta-validation challenge remains irreducible: every validation method itself requires justification. Why trust expert judgment? (Appeal to authority.) Why trust empirical testing? (Problem of induction.) Why trust logical consistency? (Circularity — logic validating logic.)

This is why ontology alone is not enough. Ontology construction is a process that requires ontology specialists. But the next layer is epistemology — it’s about knowledge about knowledge. It’s the meta-layer that says this knowledge is not mere belief, or maybe it is, or maybe it has some probability.

Layer 4: Epistemology Demands Verifiability

If knowledge cannot be justified through pure reasoning, perhaps it can be verified through evidence. This intuition drives an emerging stack of verification technologies attempting to ground AI knowledge claims in cryptographic proof and authenticated provenance.

Then we need more transparency and trust in this epistemology. For some cases we’re okay with certain sources; in other cases we’re not. Together with epistemology, we need the concept of verifiability and trust chains.

I know these concepts come more from crypto and self-sovereign identity, but they’re deeply relevant to knowledge representation. If we could trace facts to their sources, to other sources, and establish roots we trust — it’s like in mathematics where we have theorems and axioms, and at some point we just believe in axioms (and they’re somehow provable through consistency).

This trust chain of knowledge allows us to trace information to the source and make sources verifiable.

The Verification Technology Stack

W3C Verifiable Credentials (v2.0) provide cryptographically secured digital credentials in a three-party model (issuer → holder → verifier). Applications to AI include binding model provenance to verified identities, tracking dataset licensing through credential chains, and authenticating AI agents.

The Data & Trust Alliance standards, released in 2024 and now advancing through OASIS, offer the first cross-industry metadata framework for data provenance — developed by IBM, Mastercard, Pfizer, and others. These standards emerged because 61% of CEOs cite lack of data lineage clarity as the top barrier to GenAI adoption.

C2PA (Coalition for Content Provenance and Authenticity) provides cryptographically signed metadata embedded in media files, supported by Adobe, Microsoft, Google, OpenAI, BBC, Sony, Canon, and Nikon. The standard enables detection of tampering and tracking of editing history.

Zero-knowledge machine learning (zkML) represents the cryptographic frontier — generating proofs that specific model inferences occurred correctly without revealing underlying data. Worldcoin uses zkML for iris verification; frameworks now support proofs for GPT-2 and diffusion models.

The Limits of Verification

Yet verification faces fundamental limits:

zkML proves inference correctness, not training appropriateness: It can verify a specific model generated an output, but not whether training data was factually accurate, unbiased, or legally sourced. Current systems cannot handle LLMs with billions of parameters — only smaller models are feasible.

C2PA has extensive industry support but “very little internet content” actually uses it.

Most critically: all these systems prove authentication and attribution, not epistemological verification. They answer “did this entity say this?” but cannot answer “is what they said true?”

Another thing: when you have a training dataset that could be reliably proved to come from verified sources, but then you turn that training data into the LLM itself through the training process, the LLM can introduce hallucinations. You need to be careful with this aspect too.

Layer 5: The Trust Infrastructure Paradox

Verification technologies require trust infrastructure — systems for establishing who and what can be trusted. This creates another layer of the bootstrapping problem: trust infrastructure itself must be bootstrapped. Who do we trust to establish trust?

We all have some roots of trust. Some SSI (self-sovereign identity) features help us make things immutable and cryptographically verifiable on top. This gives us a stronger system where we could say we really know what we know and could rely on solutions proposed by LLMs.

KERI (Key Event Receipt Infrastructure) attempts to solve this through self-certifying identifiers — cryptographic identifiers bound to key pairs at inception, with append-only logs of key events witnessed by distributed validators. The architecture claims “the root-of-trust is fully cryptographic, there is no infrastructure associated with it.” Yet even KERI depends on the initial key generation ceremony being secure and uncompromised.

The Trust Over IP (ToIP) Foundation defines a four-layer dual stack combining technology and governance, explicitly acknowledging that “human accountability at the business, legal, and social layers” is required alongside technical verification.

EBSI (European Blockchain Services Infrastructure) distributes trust across EU member states through a blockchain network — but ultimately derives authority from governmental mandate.

The Web PKI system that secures the internet relies on roughly 100 Certificate Authorities pre-installed in browsers — a trust assumption that proved vulnerable in the 2011 DigiNotar attack, when fraudulent certificates for Google, Yahoo, and Mozilla were trusted by major browsers.

As security researchers note: “If you follow this chain of trust back far enough, you’ll always find people: every trust chain ends in meatspace.” You might use SSH to distribute root certificates, but the SSH PKI is itself bootstrapped off Web PKI plus cloud vendor authentication… the regression continues.

Even “decentralized” systems depend on initial trust assumptions that cannot be cryptographically proven. The roots of trust must be established through some mechanism — and that mechanism itself requires trust.

The Integration Challenge: A Multi-Layered Approach

The chain of mechanisms I’ve described — knowledge graphs with ontology, supported by epistemology that classifies what kind of knowledge it is, together with trust chains that say from which source this knowledge came and on which sources it’s based — this gives us more confidence that we could rely on knowledge.

But each layer alone is insufficient:

Knowledge graphs provide structure but embed construction errors from their LLM builders
Ontologies provide formal semantics but face the validation trilemma
Epistemological frameworks acknowledge uncertainty but cannot transcend it
Verification systems prove authenticity but not truth
Trust infrastructure provides roots, but those roots rest on human judgment

Real-world failures illustrate these gaps.

Case Study: Air Canada’s Chatbot Hallucination

In the Air Canada chatbot lawsuit (Moffatt v. Air Canada, 2024), a customer asked about bereavement fares. The RAG-powered chatbot hallucinated a policy allowing retroactive discount applications — contradicting documents on the same website.

Air Canada was ordered to pay damages. The tribunal rejected their defense that “the chatbot is a separate legal entity responsible for its own actions.”

The knowledge base was correct. The retrieval succeeded. The generation still hallucinated.

This wasn’t a failure of knowledge graphs, ontology, or verification — it was a failure of the generation layer that no amount of infrastructure can fully prevent, as the mathematical proofs demonstrate.

The Brutal Statistics of RAG Failure

Enterprise RAG implementations fail at a 72% rate in the first year. Research identifies seven distinct failure points: missing content, failed retrieval, ranking failures, context truncation, information extraction failure, generation errors, and output format errors.

Each represents a different layer where the stack can break. Knowledge graphs help but introduce their own challenges — maintenance burden, scalability bottlenecks with billions of nodes, schema evolution in dynamic environments.

The long-term cost of a true enterprise knowledge graph runs $10–20 million, requiring specialized expertise most organizations lack.

We made this indirect circular dependency between LLMs and knowledge graphs without deep understanding of knowledge itself. This approach will not fly, no matter what kind of technologies we use, because it propagates hallucinations and errors just deeper in the network.

What We Need: The Multi-Layered Solution

I don’t have the ultimate solution for this problem. But I see it as a multi-layered approach that requires honesty about what each layer can and cannot provide.

The Honest Technology Stack

Layer 1: Knowledge Graphs with Ontology

We need knowledge graphs, but built with formal ontologies that provide semantic grounding and reasoning capabilities. Where humans are still in the loop for ontology design and validation.

Layer 2: Epistemology as Meta-Knowledge

We need an epistemological layer supporting the ontology and explaining the knowledge — classifying what type of knowledge claim we’re making, with what confidence, based on what evidence.

Layer 3: Verifiability and Trust Chains

We need verifiability and traceability of data, supported by trust chains that let us trace claims to their sources and evaluate those sources’ reliability.

Layer 4: Cryptographic Roots of Trust

We need some roots of trust with SSI features that make provenance immutable and cryptographically verifiable.

What Genuine Progress Would Look Like

The path forward requires intellectual honesty: zero hallucination is an engineering impossibility, not a marketing milestone. The question becomes how to build systems that acknowledge and manage uncertainty rather than pretending to eliminate it.

Genuine progress involves several elements not yet integrated:

Uncertainty quantification at every layer: Rather than claiming certainty, systems should propagate confidence estimates through the entire stack — from knowledge graph construction (how confident is this triple?) through ontological reasoning (how well-supported is this inference?) through retrieval (how relevant is this context?) through generation (how grounded is this output?).

Verifiable training provenance: Current zkML proves inference but not training. Closing this gap requires technical advances in verifiable computation at scale, plus institutional frameworks for auditing training data and processes.

Federated trust with explicit assumptions: Rather than seeking a universal root of trust, systems could operate under explicit, queryable trust assumptions. What CAs does this system trust? What governance frameworks apply? What human judgments underpin these claims? Making assumptions explicit enables appropriate skepticism.

Graceful degradation and abstention: Google research shows RAG systems “confidently provide incorrect answers even when presented with retrieved evidence” and that additional context can paradoxically reduce a model’s ability to abstain when lacking information. Systems should be designed to recognize and communicate their limitations rather than generating plausible-sounding completions.

Human-AI epistemological partnerships: The bootstrapping problem cannot be escaped through technology alone. As philosophy long recognized, all justification terminates in something unjustified — axioms, intuitions, consensus, or practice. AI systems should be designed as partners in human epistemic processes, not replacements for human judgment.

This means maintaining human oversight at critical decision points and ensuring humans understand what systems can and cannot know. We could think about ontologies where humans are still in the loop. We could think about the epistemology layer as supporting the ontology and explaining the knowledge.

Conclusion: Beyond the Marketing Mirage

The “zero hallucination” claim is not merely exaggerated — it is structurally impossible, proven so through formal mathematics. The circular dependency between LLMs and the knowledge graphs built to ground them represents just the first layer of a deeper problem extending through ontology, epistemology, verification, and trust.

Each layer promises to solve the problem but merely relocates it. We cannot escape the fundamental epistemological condition that has constrained all knowledge systems throughout human history.

This is not cause for despair but for appropriate engineering humility. The Air Canada chatbot failed not because the technology was primitive but because expectations were misaligned with capabilities. A system that acknowledged uncertainty, flagged low-confidence responses, and escalated ambiguous queries would serve users better than one claiming impossible perfection.

The technology stack I’ve described — knowledge graphs, formal ontologies, epistemological meta-layers, verifiable credentials, content provenance standards, cryptographic proofs, trust registries — provides genuine value when properly understood. These tools reduce hallucination rates, improve traceability, enable accountability, and support human oversight.

What they cannot do is escape the epistemological condition that has constrained all knowledge systems throughout human history: justified certainty about the world is beyond our reach.

For sensitive areas where people’s lives depend on AI decisions, we need this complex, multi-layered approach. We need to stop propagating the hallucination and errors deeper into the network through circular dependencies. We need to be honest about what we can and cannot know.

Building AI systems that acknowledge this limitation, quantify their uncertainty, make their assumptions explicit, and partner with human judgment rather than replacing it — this represents genuine progress.

The alternative is a marketing mirage that will inevitably encounter reality, as Air Canada discovered in a Canadian courtroom. The question is not whether AI can achieve zero hallucinations. It cannot, mathematically and philosophically.

The question is whether we will build systems that manage this limitation honestly, with the full stack of knowledge graphs, ontologies, epistemology, verifiability, and trust infrastructure working together — or continue selling an impossibility until the lawsuits catch up.

Then we will have even a stronger system where we could say that we really know what we know, and we could really rely on the solutions proposed by LLMs. Not with zero hallucinations — that’s impossible. But with honest uncertainty quantification, verifiable provenance, and human oversight at the critical points where it matters most.

Subscribe to sovereign ai agents

to get updates in Reader, RSS, or via Bluesky Feed

Beyond Hierarchy: Why Agentic AI Systems Need Heterarchy & Holarchies

Missed Layers for AI Agent Protocols