Under the hood of narratheque.io:
why we combine vector RAG and LLM Wiki
At narratheque.io, we made the choice to integrate two complementary approaches: classic vector RAG and a mechanism inspired by LLM Wiki recently formalized by Andrej Karpathy. This page explains what each one brings, their limitations, and why their combination concretely changes the quality of answers you get from your collaborative brain.
Vector RAG and LLM Wiki: what is it in two sentences?
RAG stands for Retrieval-Augmented Generation. The main idea: instead of asking a language model (LLM) to answer only with what it learned during training, you provide it in real time with the right excerpts from your documents so it can rely on them.
The concept was formalized in 2020 by Patrick Lewis and his team at Facebook AI Research. It’s this mechanism, improved by our engineers, that enables narratheque.io to answer based on your PDFs, YouTube videos, audio transcriptions and web pages, without hallucination, and without your data leaving the sovereign environment.
Vector RAG and LLM Wiki: two philosophies, two engines
The core difference lies in when the intellectual work is done: at each query for RAG, or once and for all (then enriched continuously) for LLM Wiki. Indeed, these two approaches structure your knowledge base differently.
VECTOR RAG
The foundation: search fast through everything
The LLM rediscovers your documents at each question.
HOW IT WORKS
Your documents are broken into chunks, transformed into vectors by an embedding model, and stored in a vector database. When asked, the closest chunks are found semantically and passed to the LLM.
STRENGTHS
- Massive coverage: tens of thousands of pages, hours of video
- Tolerance for paraphrasing thanks to semantic embeddings
- Incremental updates, inexpensive per document
LIMITATIONS
- No memory between questions
- Multi-source synthesis redone from scratch at each query
- Contradictions between sources never detected
- Global context of a long document can be lost
LLM WIKI
The layer that grows knowledge
The LLM builds and maintains a structured wiki that enriches itself.
HOW IT WORKS
Pattern formalized by Andrej Karpathy in April 2026. For each source, the LLM creates or updates entity cards, concepts, and links them. A single source can touch 10 to 15 pages of the wiki.
STRENGTHS
- Knowledge that accumulates (compounding effect)
- Pre-built syntheses, comparisons and timelines
- Automatic contradiction detection
- Readable and navigable by humans (markdown)
LIMITATIONS
- Higher ingestion cost (LLM works on each addition)
- Scaling difficult beyond hundreds of sources
- Sensitive to quality of the writing model
Comparative diagram: Vector RAG vs LLM Wiki
On the RAG side, the LLM does nothing at indexing and everything at question time. On the LLM Wiki side, it’s the opposite: the work is done at ingestion, and the query relies on already structured knowledge. Moreover, this architectural difference explains the advantages and limitations of each approach.
Neither approach is complete in isolation
It’s their combination that produces the results our users experience. Here’s how each need is served by the right tool.
How do the two components work together?
The orchestration unfolds in three stages: an import that feeds both pipelines in parallel, a query that selects the right source, and a capitalization loop that makes the system smarter with each use. Thus, Narratheque intelligently combines these two approaches to optimize your knowledge base.
At import, everything happens in parallel
At the query, the engine chooses
Over time, the wiki grows
Why is narratheque.io technically interesting?
Vector RAG and LLM Wiki architecture is a solid foundation. It combines with several strategic choices that set the platform apart in the enterprise AI solutions landscape. Moreover, each technical decision answers a real need from our users.
Multi-LLM in the same base
Real sovereignty
Universal ingestion
No technical lock-in
Built for dark data
Traceable answers, no hallucinations
Resources and additional readings
Articles and publications cited
- llm-wiki.md — the original text by Andrej Karpathy that formalizes the pattern (April 2026)
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al. (2020), the founding article of RAG
- What Is Retrieval-Augmented Generation? — NVIDIA Blog, accessible explanation with interview of Patrick Lewis
- A Survey on Knowledge-Oriented Retrieval-Augmented Generation (2025), academic overview of RAG developments
- The narratheque.io blog to explore use cases: strategic intelligence, brand DNA, timelines, dark data
Ready to transform your archives into a collaborative brain?
Ten minutes is enough to activate a trial account and see your first corpus transform into a queryable base, powered by both vector RAG and LLM Wiki.

