Complete tutorial: build a trustworthy RAG knowledge assistant
A full end-to-end tutorial for planning, building, testing, and improving a RAG system without pretending API keys or production infrastructure are already solved.
Prerequisites
- A small set of trusted documents
- Basic understanding of LLM prompts
- A model provider or local model plan
- A decision about cloud vs local deployment
Step-by-step tutorial
Step 1
Define the assistant's job
Start with a narrow use case. A RAG assistant should have a clear domain, allowed sources, target users, and a no-answer policy.
- Name the user group
- List accepted source types
- Write five questions it must answer
- Write five questions it must refuse or escalate
Step 2
Prepare the knowledge base
Collect authoritative documents, remove duplicates, record metadata, and decide how updates will be handled.
- Keep source URLs or file IDs
- Add owner, date, type, language, and access metadata
- Remove obsolete copies
- Record source authority
Step 3
Parse and chunk documents
Convert documents into text while preserving useful structure. Chunk by section, page, heading, or semantic unit rather than blindly splitting every fixed number of characters.
- Inspect parsed text manually
- Preserve page numbers
- Try 500-1,200 token chunks
- Use overlap only when it helps continuity
Step 4
Create embeddings and index chunks
Use an embedding model to represent each chunk and store vectors with metadata in a vector database or platform knowledge base.
- Choose embedding model
- Store source metadata with each chunk
- Index a small corpus first
- Keep re-indexing reproducible
Step 5
Retrieve evidence
For each user question, retrieve candidate passages. Start simple, then compare vector search, keyword search, hybrid search, and reranking.
- Inspect top retrieved chunks
- Test exact names and acronyms
- Add metadata filters
- Record missed-source failures
Step 6
Assemble the prompt
Pass retrieved context to the model with instructions that separate system rules from source text. Treat retrieved content as data, not as instructions.
- Require citations
- Tell the model to say when evidence is missing
- Limit context to relevant passages
- Protect against prompt injection
Step 7
Generate answers with citations
The user interface should expose answer text, source snippets, document titles, and page or section references where possible.
- Show source title
- Show passage preview
- Link to original document when allowed
- Avoid uncited factual claims
Step 8
Evaluate before launch
Create a small but representative test set. Score retrieval separately from generation so you know whether failures come from missing evidence or answer synthesis.
- Measure retrieval recall
- Review answer faithfulness
- Check citation accuracy
- Test out-of-scope questions
Step 9
Improve and maintain
RAG is a maintained knowledge service. Watch failed queries, stale sources, permission issues, and changes in user behavior.
- Add trace logging
- Schedule source refresh
- Review low-confidence answers
- Retest after every retrieval change
Framework-neutral RAG flow
type Chunk = { text: string; source: string; page?: number };
async function answerWithRag(question: string) {
const candidates = await retriever.search(question, {
topK: 12,
filters: { status: "approved" },
});
const context = await reranker.keepBest(question, candidates, { topK: 5 });
const answer = await model.generate({
system: "Answer only from supplied sources. Say when evidence is missing.",
user: question,
context: context.map((chunk: Chunk) => ({
text: chunk.text,
citation: [chunk.source, chunk.page].filter(Boolean).join(" p. "),
})),
});
return {
answer: answer.text,
citations: answer.citations,
retrievedSources: context,
};
}Next steps
- Try the same corpus with hybrid search
- Add reranking
- Create a regression test set
- Compare Dify, LangChain, and LlamaIndex implementations