Skip to main content
Beginner90-120 minDify or LangChaindocument parserembedding modelvector database

Complete tutorial: build a trustworthy RAG knowledge assistant

A full end-to-end tutorial for planning, building, testing, and improving a RAG system without pretending API keys or production infrastructure are already solved.

Prerequisites

  • A small set of trusted documents
  • Basic understanding of LLM prompts
  • A model provider or local model plan
  • A decision about cloud vs local deployment

Step-by-step tutorial

Step 1

Define the assistant's job

Start with a narrow use case. A RAG assistant should have a clear domain, allowed sources, target users, and a no-answer policy.

  • Name the user group
  • List accepted source types
  • Write five questions it must answer
  • Write five questions it must refuse or escalate

Step 2

Prepare the knowledge base

Collect authoritative documents, remove duplicates, record metadata, and decide how updates will be handled.

  • Keep source URLs or file IDs
  • Add owner, date, type, language, and access metadata
  • Remove obsolete copies
  • Record source authority

Step 3

Parse and chunk documents

Convert documents into text while preserving useful structure. Chunk by section, page, heading, or semantic unit rather than blindly splitting every fixed number of characters.

  • Inspect parsed text manually
  • Preserve page numbers
  • Try 500-1,200 token chunks
  • Use overlap only when it helps continuity

Step 4

Create embeddings and index chunks

Use an embedding model to represent each chunk and store vectors with metadata in a vector database or platform knowledge base.

  • Choose embedding model
  • Store source metadata with each chunk
  • Index a small corpus first
  • Keep re-indexing reproducible

Step 5

Retrieve evidence

For each user question, retrieve candidate passages. Start simple, then compare vector search, keyword search, hybrid search, and reranking.

  • Inspect top retrieved chunks
  • Test exact names and acronyms
  • Add metadata filters
  • Record missed-source failures

Step 6

Assemble the prompt

Pass retrieved context to the model with instructions that separate system rules from source text. Treat retrieved content as data, not as instructions.

  • Require citations
  • Tell the model to say when evidence is missing
  • Limit context to relevant passages
  • Protect against prompt injection

Step 7

Generate answers with citations

The user interface should expose answer text, source snippets, document titles, and page or section references where possible.

  • Show source title
  • Show passage preview
  • Link to original document when allowed
  • Avoid uncited factual claims

Step 8

Evaluate before launch

Create a small but representative test set. Score retrieval separately from generation so you know whether failures come from missing evidence or answer synthesis.

  • Measure retrieval recall
  • Review answer faithfulness
  • Check citation accuracy
  • Test out-of-scope questions

Step 9

Improve and maintain

RAG is a maintained knowledge service. Watch failed queries, stale sources, permission issues, and changes in user behavior.

  • Add trace logging
  • Schedule source refresh
  • Review low-confidence answers
  • Retest after every retrieval change

Framework-neutral RAG flow

type Chunk = { text: string; source: string; page?: number };

async function answerWithRag(question: string) {
  const candidates = await retriever.search(question, {
    topK: 12,
    filters: { status: "approved" },
  });

  const context = await reranker.keepBest(question, candidates, { topK: 5 });

  const answer = await model.generate({
    system: "Answer only from supplied sources. Say when evidence is missing.",
    user: question,
    context: context.map((chunk: Chunk) => ({
      text: chunk.text,
      citation: [chunk.source, chunk.page].filter(Boolean).join(" p. "),
    })),
  });

  return {
    answer: answer.text,
    citations: answer.citations,
    retrievedSources: context,
  };
}

Next steps

  • Try the same corpus with hybrid search
  • Add reranking
  • Create a regression test set
  • Compare Dify, LangChain, and LlamaIndex implementations