What I Think RAG Gets Wrong
We built sophisticated retrieval on top of a knowledge base that doesn't know what it knows.
What if your knowledge base worked like a second memory — something you could talk to, ask to remember things for you, and trust to tell you not just what it knows, but whether that knowledge is still true?
That is not what current search systems do. You ask a question, the system retrieves a document, the document is real — and the answer is wrong. Not fabricated. Just accurate eighteen months ago and silently outdated since. The index never noticed. Nobody told it to.
These failures feel like retrieval problems. They are not. They started much earlier, at the moment each document was ingested — chunked, embedded, dropped into a vector store with no awareness of what was already there, no sense of time, no understanding of its own place in the broader knowledge landscape.
We have spent years building better retrieval. My read: we built the wrong thing.
Three generations of search
Every knowledge system rests on two operations: indexing (how information is structured before you search) and retrieval (how it is found when you do). What you can retrieve is always constrained by what you indexed — but the two have never received equal attention.
The Keyword Generation
Search began as an indexing problem. TF-IDF, BM25, inverted indexes — the intelligence lived in the structure of the index itself. Retrieval was a lookup over a carefully built representation. The philosophy was sound: invest in the index, and retrieval becomes cheap.
The Embedding Generation
Neural methods introduced dense vector representations — semantic meaning encoded as points in high-dimensional space. This was a genuine indexing advance. But retrieval was catching up fast: learned ranking models, semantic matching, query understanding. The center of gravity began to shift.
The RAG Generation
Retrieval-Augmented Generation, introduced in 2020, completed the shift. What followed was an explosion of retrieval-side innovation: query rewriting, multi-hop retrieval, re-ranking, agentic tool use. But notice what stayed constant throughout: the index itself — a flat collection of text chunks, embedded with a standard encoder, stored in a vector database. The sophistication piled up around the index, not inside it.
The center of gravity shifted — indexing declined as retrieval absorbed all the innovation
We are over-engineering retrieval to compensate for under-engineered knowledge bases.
What the index still doesn’t know
Modern pipelines have solved a lot. Document parsers handle tables, images, and complex layouts. Better chunking strategies preserve logical boundaries. Multimodal models can index diagrams alongside text. The structure problem is largely behind us.
What remains unsolved is deeper.
Documents don’t know they’re aging. Every document is indexed identically regardless of when it was written, what it supersedes, or whether it has expired. A 2021 policy and a 2025 amendment sit in the same flat store with no signal of their relationship.
Documents don’t know each other. Each chunk is stored in isolation — no awareness of what the rest of the corpus already knows, no way to surface contradictions, no understanding of whether two documents are saying the same thing or opposite things about the same topic.
The base doesn’t update when the world does. New documents are appended. Nothing is ever revised. Outdated claims stay confident. Contradictions accumulate silently. The index grows; it does not learn.
The retrieval system is sophisticated. What I think it is searching through has a different kind of problem.
A new kind of ingestion
LLMs can read a document the way a human expert would — extracting claims, understanding context, recognizing when something contradicts or extends something else. Agents can navigate a corpus the way a researcher would — following references, cross-checking facts, surfacing what matters. We have had these capabilities for a few years. We are not using them at ingestion time.
What this makes possible is a fundamentally different model: every document added to the knowledge base triggers an active review of what is already there. Not just embedding and storing — reading, comparing, reconciling. Does this new document confirm a claim already in the base? Contradict it? Supersede it? Those relationships get written into the index explicitly, not left for retrieval to guess at query time.
It also means that what is important can be made visible from the start. An agent reading a document at ingestion can identify the key claims, generate summaries at multiple levels of detail, and surface the golden information — the facts, thresholds, decisions — so they are findable at a glance rather than buried in a passage. The best answer to a likely question does not need to be assembled from fragments at query time if it was already extracted and stored when the document arrived.
And crucially, each part of the knowledge base can be made aware of the others. A method described in one document can carry a reference to a complementary method described elsewhere. A claim in one section can link to the document that later revised it. An LLM at ingestion time can build those connections systematically — not as a side effect of search, but as a deliberate act of organization.
This is what generative AI and agentic capabilities unlock: ingestion as an intelligent, relational process, not a mechanical one.
The knowledge base that learns
This is the central argument.
The knowledge decay problem — the silent accumulation of outdated, contradictory, or superseded information — is reportedly responsible for 60% of enterprise RAG project failures. Not bad retrieval. Just the quiet rot of a system that keeps appending without ever asking whether what it stored last year is still true today.
You add a document; nothing else changes. The new document takes its place alongside everything that came before, with no awareness of what it confirms, contradicts, or supersedes. The base grows; it does not learn.
Consider what should happen when new knowledge arrives. A paper establishes Model Z as the new state of the art, clearly surpassing Model X. Somewhere in your knowledge base, a document confidently states that Model X is state of the art.
In today’s systems: both documents coexist. An agent may retrieve the older one, answer confidently, and be wrong.
With agentic ingestion, every addition triggers an active review of what is already there. Before the new document is stored, the system reads the relevant neighborhood of the existing base, identifies what the new document confirms, contradicts, or supersedes, and writes those relationships explicitly into the index. The existing document gets annotated — “a newer result has superseded this claim, see Document B” — and the new one arrives with context already attached — “this improves upon the benchmark in Document A”. Both documents are now accurate. The knowledge base has a coherent present tense.
Concretely, this means:
- Read before writing. Query the existing base before adding to it.
- Reconcile. Determine what the new document confirms, contradicts, or supersedes.
- Retroactive enrichment. Update existing entries to reflect what has changed.
- Bi-directional linking. New document says where it comes from; old document says where it leads.
This is not a preprocessing step. It is a reasoning task — and LLMs are well-suited to it.
A knowledge base is not a collection of documents. It is a model of what is currently known. Every addition is an update to that model, not an append to a list.
Why the field got stuck
There is a structural reason: benchmarks.
Current agentic search benchmarks hand a static, messy corpus to the agent as a fixed input and measure how well it retrieves answers. This encodes a hidden assumption — the knowledge base is uncontrollable. Does the base know which documents contradict each other? Not measured. Does it track how facts evolve over time — a claim that was true in 2022, revised in 2024, superseded in 2025 — and surface that lineage when it matters? Not measured. Does adding new knowledge trigger a review of existing entries? Not measured. Every system gets better at navigating a broken base, and nobody asks whether the base needed to be broken in the first place.
A better benchmark design: give the agent raw source material, let it perform the ingestion itself, then evaluate Q&A. The agent never sees the benchmark questions during ingestion, so there is no leakage — but now the full pipeline is rewarded. I think the benchmark is not a neutral measurement tool. It is a feedback loop, and it has been pointing in the wrong direction.
Toward a third generation
Three generations of index design, each more expressive than the last:
First generation: term indices. Fast. Precise. Blind to meaning.
Second generation: vector indices. Semantically aware. Blind to structure, time, and relationship.
What I’d call a third generation: living semantic knowledge bases. Structured, enriched, temporally aware, relationally ingested, self-updating. Not a corpus of documents — a model of what is currently known.
A deeply enriched knowledge base starts to surface things no current architecture can reach: visible contradictions between documents written at different times, gaps in coverage, the evolution of a concept across versions. Vannevar Bush imagined something like this in 1945 — knowledge linked by trails of association, a thinking partner rather than a retrieval engine. The tools to build it now exist.
The retrieval side of search has been thoroughly explored. To me, the indexing side, reimagined through the lens of generative AI, is largely open territory — not because the problems are unknown, but because the field has been looking in the other direction.
References
- Andrej Karpathy, LLM Wiki — GitHub Gist (April 2026)
- Vannevar Bush, As We May Think, The Atlantic (1945)
- The Knowledge Decay Problem — RAG About It
- Temporal RAG: Time-Aware Retrieval That Stays Fresh
- KnowledgeBase Guardian — contradiction detection at ingestion time
- MemOS: A Memory OS for AI Systems (2025)
- A Comprehensive Study of Knowledge Editing for Large Language Models (2024)
- GraphRAG — Microsoft Research