Technologies

Under the hood of narratheque.io:
why we combine vector RAG and LLM Wiki

At narratheque.io, we made the choice to integrate two complementary approaches: classic vector RAG and a mechanism inspired by LLM Wiki recently formalized by Andrej Karpathy. This page explains what each one brings, their limitations, and why their combination concretely changes the quality of answers you get from your collaborative brain.

The concept

Vector RAG and LLM Wiki: what is it in two sentences?

RAG stands for Retrieval-Augmented Generation. The main idea: instead of asking a language model (LLM) to answer only with what it learned during training, you provide it in real time with the right excerpts from your documents so it can rely on them.

The concept was formalized in 2020 by Patrick Lewis and his team at Facebook AI Research. It’s this mechanism, improved by our engineers, that enables narratheque.io to answer based on your PDFs, YouTube videos, audio transcriptions and web pages, without hallucination, and without your data leaving the sovereign environment.

Two approaches

Vector RAG and LLM Wiki: two philosophies, two engines

The core difference lies in when the intellectual work is done: at each query for RAG, or once and for all (then enriched continuously) for LLM Wiki. Indeed, these two approaches structure your knowledge base differently.

VECTOR RAG

The foundation: search fast through everything

The LLM rediscovers your documents at each question.

HOW IT WORKS

Your documents are broken into chunks, transformed into vectors by an embedding model, and stored in a vector database. When asked, the closest chunks are found semantically and passed to the LLM.

STRENGTHS

Massive coverage: tens of thousands of pages, hours of video
Tolerance for paraphrasing thanks to semantic embeddings
Incremental updates, inexpensive per document

LIMITATIONS

No memory between questions
Multi-source synthesis redone from scratch at each query
Contradictions between sources never detected
Global context of a long document can be lost

LLM WIKI

The layer that grows knowledge

The LLM builds and maintains a structured wiki that enriches itself.

HOW IT WORKS

Pattern formalized by Andrej Karpathy in April 2026. For each source, the LLM creates or updates entity cards, concepts, and links them. A single source can touch 10 to 15 pages of the wiki.

STRENGTHS

Knowledge that accumulates (compounding effect)
Pre-built syntheses, comparisons and timelines
Automatic contradiction detection
Readable and navigable by humans (markdown)

LIMITATIONS

Higher ingestion cost (LLM works on each addition)
Scaling difficult beyond hundreds of sources
Sensitive to quality of the writing model

Visualization

Comparative diagram: Vector RAG vs LLM Wiki

On the RAG side, the LLM does nothing at indexing and everything at question time. On the LLM Wiki side, it’s the opposite: the work is done at ingestion, and the query relies on already structured knowledge. Moreover, this architectural difference explains the advantages and limitations of each approach.

Why both

Neither approach is complete in isolation

It’s their combination that produces the results our users experience. Here’s how each need is served by the right tool.

Find a precise passage in 200 hours of transcribed video

VECTOR RAG

Know “who is Sarah” and everything that’s been said about her in the corpus

LLM WIKI

Quick synthesis on an already well-documented topic

LLM WIKI

Specific question about a technical detail or exact figure

VECTOR RAG

Compare two positions, two periods, two actors

LLM WIKI

Consistency audit across the entire corpus

LLM WIKI

Exhaustive coverage of a massive document collection

VECTOR RAG

On narratheque.io

How do the two components work together?

The orchestration unfolds in three stages: an import that feeds both pipelines in parallel, a query that selects the right source, and a capitalization loop that makes the system smarter with each use. Thus, Narratheque intelligently combines these two approaches to optimize your knowledge base.

At import, everything happens in parallel

Each file (PDF, Word, YouTube video, audio, URL) triggers an automated pipeline: OCR on images, transcription of media, text extraction. Then both pipelines activate in parallel: vectorization for RAG, and feeding the structured wiki.

At the query, the engine chooses

Depending on the nature of the query — precise factual, cross-cutting, comparative, chronological — the orchestration queries either vector RAG, LLM Wiki, or both by combining their outputs for the final answer.

Over time, the wiki grows

The wiki enriches itself automatically at each ingestion, and good answers produced can be reinjected as new pages. This is the capitalization effect described by Karpathy: the base becomes smarter, not just bigger.

Beyond RAG

Why is narratheque.io technically interesting?

Vector RAG and LLM Wiki architecture is a solid foundation. It combines with several strategic choices that set the platform apart in the enterprise AI solutions landscape. Moreover, each technical decision answers a real need from our users.

Multi-LLM in the same base

Query the same base with OpenAI, Anthropic, Google Gemini, Mistral or a local Ollama model, in the same session, and compare answers. Most solutions lock you into a single provider.

Real sovereignty

Data is hosted on dedicated servers in Europe or Canada of your choice, and is never used to train public models. The structured wiki remains a controlled copy of your knowledge, exportable.

Universal ingestion

PDF, Word, websites, YouTube, audio, images: the analysis pipeline recognizes formats and automatically applies OCR, transcription, indexing. User uploads, system takes care of the rest.

No technical lock-in

KDBCore by Jolifish Europe can be deployed in a dedicated environment for enterprise needs. The chatbot integrates via HTML snippet into WordPress, Shopify, Webflow.

Built for dark data

80% of enterprise data is underutilized because it’s neither searchable nor queryable. Vector RAG and LLM Wiki combination is exactly what’s needed to transform silent archives into an active brain.

Traceable answers, no hallucinations

The LLM relies strictly on your base. If it doesn’t know, it says so and alerts you so you can supplement. All answers can be traced back to sources and wiki pages.

Resources and additional readings

Articles and publications cited

llm-wiki.md — the original text by Andrej Karpathy that formalizes the pattern (April 2026)
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al. (2020), the founding article of RAG
What Is Retrieval-Augmented Generation? — NVIDIA Blog, accessible explanation with interview of Patrick Lewis
A Survey on Knowledge-Oriented Retrieval-Augmented Generation (2025), academic overview of RAG developments
The narratheque.io blog to explore use cases: strategic intelligence, brand DNA, timelines, dark data

Ready to transform your archives into a collaborative brain?

Ten minutes is enough to activate a trial account and see your first corpus transform into a queryable base, powered by both vector RAG and LLM Wiki.

Try for free

Schedule a call

Under the hood of narratheque.io:why we combine vector RAG and LLM Wiki

Vector RAG and LLM Wiki: what is it in two sentences?

Vector RAG and LLM Wiki: two philosophies, two engines

VECTOR RAG

The foundation: search fast through everything

HOW IT WORKS

STRENGTHS

LIMITATIONS

LLM WIKI

The layer that grows knowledge

HOW IT WORKS

STRENGTHS

LIMITATIONS

Comparative diagram: Vector RAG vs LLM Wiki

Neither approach is complete in isolation

How do the two components work together?

At import, everything happens in parallel

At the query, the engine chooses

Over time, the wiki grows

Why is narratheque.io technically interesting?

Multi-LLM in the same base

Real sovereignty

Universal ingestion

No technical lock-in

Built for dark data

Traceable answers, no hallucinations

Resources and additional readings

Articles and publications cited

Ready to transform your archives into a collaborative brain?

Quelle est votre question ?

Under the hood of narratheque.io:
why we combine vector RAG and LLM Wiki