Technologies

Under the hood of narratheque.io:
why we combine vector RAG and LLM Wiki

At narratheque.io, we made the choice to integrate two complementary approaches: classic vector RAG and a mechanism inspired by LLM Wiki recently formalized by Andrej Karpathy. This page explains what each one brings, their limitations, and why their combination concretely changes the quality of answers you get from your collaborative brain.

Vector RAG and LLM Wiki: what is it in two sentences?

RAG stands for Retrieval-Augmented Generation. The main idea: instead of asking a language model (LLM) to answer only with what it learned during training, you provide it in real time with the right excerpts from your documents so it can rely on them.

The concept was formalized in 2020 by Patrick Lewis and his team at Facebook AI Research. It’s this mechanism, improved by our engineers, that enables narratheque.io to answer based on your PDFs, YouTube videos, audio transcriptions and web pages, without hallucination, and without your data leaving the sovereign environment.

Vector RAG and LLM Wiki: two philosophies, two engines

The core difference lies in when the intellectual work is done: at each query for RAG, or once and for all (then enriched continuously) for LLM Wiki. Indeed, these two approaches structure your knowledge base differently.

VECTOR RAG

The foundation: search fast through everything

The LLM rediscovers your documents at each question.

HOW IT WORKS

Your documents are broken into chunks, transformed into vectors by an embedding model, and stored in a vector database. When asked, the closest chunks are found semantically and passed to the LLM.

STRENGTHS
  • Massive coverage: tens of thousands of pages, hours of video
  • Tolerance for paraphrasing thanks to semantic embeddings
  • Incremental updates, inexpensive per document
LIMITATIONS
  • No memory between questions
  • Multi-source synthesis redone from scratch at each query
  • Contradictions between sources never detected
  • Global context of a long document can be lost

LLM WIKI

The layer that grows knowledge

The LLM builds and maintains a structured wiki that enriches itself.

HOW IT WORKS

Pattern formalized by Andrej Karpathy in April 2026. For each source, the LLM creates or updates entity cards, concepts, and links them. A single source can touch 10 to 15 pages of the wiki.

STRENGTHS
  • Knowledge that accumulates (compounding effect)
  • Pre-built syntheses, comparisons and timelines
  • Automatic contradiction detection
  • Readable and navigable by humans (markdown)
LIMITATIONS
  • Higher ingestion cost (LLM works on each addition)
  • Scaling difficult beyond hundreds of sources
  • Sensitive to quality of the writing model

Comparative diagram: Vector RAG vs LLM Wiki

On the RAG side, the LLM does nothing at indexing and everything at question time. On the LLM Wiki side, it’s the opposite: the work is done at ingestion, and the query relies on already structured knowledge. Moreover, this architectural difference explains the advantages and limitations of each approach.

Why both

Neither approach is complete in isolation

It’s their combination that produces the results our users experience. Here’s how each need is served by the right tool.

Find a precise passage in 200 hours of transcribed video
VECTOR RAG
Know “who is Sarah” and everything that’s been said about her in the corpus
LLM WIKI
Quick synthesis on an already well-documented topic
LLM WIKI
Specific question about a technical detail or exact figure
VECTOR RAG
Compare two positions, two periods, two actors
LLM WIKI
Consistency audit across the entire corpus
LLM WIKI
Exhaustive coverage of a massive document collection
VECTOR RAG
On narratheque.io

How do the two components work together?

The orchestration unfolds in three stages: an import that feeds both pipelines in parallel, a query that selects the right source, and a capitalization loop that makes the system smarter with each use. Thus, Narratheque intelligently combines these two approaches to optimize your knowledge base.

At import, everything happens in parallel

Each file (PDF, Word, YouTube video, audio, URL) triggers an automated pipeline: OCR on images, transcription of media, text extraction. Then both pipelines activate in parallel: vectorization for RAG, and feeding the structured wiki.

At the query, the engine chooses

Depending on the nature of the query — precise factual, cross-cutting, comparative, chronological — the orchestration queries either vector RAG, LLM Wiki, or both by combining their outputs for the final answer.

Over time, the wiki grows

The wiki enriches itself automatically at each ingestion, and good answers produced can be reinjected as new pages. This is the capitalization effect described by Karpathy: the base becomes smarter, not just bigger.
Beyond RAG

Why is narratheque.io technically interesting?

Vector RAG and LLM Wiki architecture is a solid foundation. It combines with several strategic choices that set the platform apart in the enterprise AI solutions landscape. Moreover, each technical decision answers a real need from our users.

Multi-LLM in the same base

Query the same base with OpenAI, Anthropic, Google Gemini, Mistral or a local Ollama model, in the same session, and compare answers. Most solutions lock you into a single provider.

Real sovereignty

Data is hosted on dedicated servers in Europe or Canada of your choice, and is never used to train public models. The structured wiki remains a controlled copy of your knowledge, exportable.

Universal ingestion

PDF, Word, websites, YouTube, audio, images: the analysis pipeline recognizes formats and automatically applies OCR, transcription, indexing. User uploads, system takes care of the rest.

No technical lock-in

KDBCore by Jolifish Europe can be deployed in a dedicated environment for enterprise needs. The chatbot integrates via HTML snippet into WordPress, Shopify, Webflow.

Built for dark data

80% of enterprise data is underutilized because it’s neither searchable nor queryable. Vector RAG and LLM Wiki combination is exactly what’s needed to transform silent archives into an active brain.

Traceable answers, no hallucinations

The LLM relies strictly on your base. If it doesn’t know, it says so and alerts you so you can supplement. All answers can be traced back to sources and wiki pages.
Further reading

Resources and additional readings

Articles and publications cited

Ready to transform your archives into a collaborative brain?

Ten minutes is enough to activate a trial account and see your first corpus transform into a queryable base, powered by both vector RAG and LLM Wiki.