Beginner90-120 minDify or LangChaindocument parserembedding modelvector database

Complete tutorial: build a trustworthy RAG knowledge assistant

A full end-to-end tutorial for planning, building, testing, and improving a RAG system without pretending API keys or production infrastructure are already solved.

Prerequisites

A small set of trusted documents
Basic understanding of LLM prompts
A model provider or local model plan
A decision about cloud vs local deployment

Step-by-step tutorial

Step 1

Define the assistant's job

Start with a narrow use case. A RAG assistant should have a clear domain, allowed sources, target users, and a no-answer policy.

Name the user group
List accepted source types
Write five questions it must answer
Write five questions it must refuse or escalate

Step 2

Prepare the knowledge base

Collect authoritative documents, remove duplicates, record metadata, and decide how updates will be handled.

Keep source URLs or file IDs
Add owner, date, type, language, and access metadata
Remove obsolete copies
Record source authority

Step 3

Parse and chunk documents

Convert documents into text while preserving useful structure. Chunk by section, page, heading, or semantic unit rather than blindly splitting every fixed number of characters.

Inspect parsed text manually
Preserve page numbers
Try 500-1,200 token chunks
Use overlap only when it helps continuity

Step 4

Create embeddings and index chunks

Use an embedding model to represent each chunk and store vectors with metadata in a vector database or platform knowledge base.

Choose embedding model
Store source metadata with each chunk
Index a small corpus first
Keep re-indexing reproducible

Step 5

Retrieve evidence

For each user question, retrieve candidate passages. Start simple, then compare vector search, keyword search, hybrid search, and reranking.

Inspect top retrieved chunks
Test exact names and acronyms
Add metadata filters
Record missed-source failures

Step 6

Assemble the prompt

Pass retrieved context to the model with instructions that separate system rules from source text. Treat retrieved content as data, not as instructions.

Require citations
Tell the model to say when evidence is missing
Limit context to relevant passages
Protect against prompt injection

Step 7

Generate answers with citations

The user interface should expose answer text, source snippets, document titles, and page or section references where possible.

Show source title
Show passage preview
Link to original document when allowed
Avoid uncited factual claims

Step 8

Evaluate before launch

Create a small but representative test set. Score retrieval separately from generation so you know whether failures come from missing evidence or answer synthesis.

Measure retrieval recall
Review answer faithfulness
Check citation accuracy
Test out-of-scope questions

Step 9

Improve and maintain

RAG is a maintained knowledge service. Watch failed queries, stale sources, permission issues, and changes in user behavior.

Add trace logging
Schedule source refresh
Review low-confidence answers
Retest after every retrieval change

Framework-neutral RAG flow

type Chunk = { text: string; source: string; page?: number };

async function answerWithRag(question: string) {
  const candidates = await retriever.search(question, {
    topK: 12,
    filters: { status: "approved" },
  });

  const context = await reranker.keepBest(question, candidates, { topK: 5 });

  const answer = await model.generate({
    system: "Answer only from supplied sources. Say when evidence is missing.",
    user: question,
    context: context.map((chunk: Chunk) => ({
      text: chunk.text,
      citation: [chunk.source, chunk.page].filter(Boolean).join(" p. "),
    })),
  });

  return {
    answer: answer.text,
    citations: answer.citations,
    retrievedSources: context,
  };
}

Next steps

Try the same corpus with hybrid search
Add reranking
Create a regression test set
Compare Dify, LangChain, and LlamaIndex implementations