RAG Reference Hub

Architectures

Practical RAG architecture patterns

Use these patterns as starting points, then validate retrieval quality, security, cost, and user outcomes for your domain.

Architecture reading method

Start with the simplest pattern that can solve the use case.
Add hybrid search when exact terms matter.
Add metadata filters when authority, language, date, or permissions matter.
Add reranking when top results are noisy.
Add observability before production release.

Basic RAG

A straightforward pipeline that retrieves relevant chunks and passes them to an LLM with the user question.

Documents

Ingestion

Chunking

Embeddings

Vector database

Retrieval

Prompt assembly

LLM answer

Citations

Recommended tools

Dify
LlamaIndex
LangChain
Chroma
Qdrant

Advantages

Easy to explain
Good first prototype
Works for many document Q&A tasks

Limitations

May miss exact keyword matches
Can retrieve noisy chunks
Needs evaluation before production

Concrete example

A course assistant that answers only from lecture notes, syllabus documents, and reading lists.

Step-by-step build path

Select trusted course files
Chunk by headings or lessons
Embed chunks
Retrieve top passages
Generate answer with citations
Review common student questions

Use for first prototypes, small knowledge bases, and educational demonstrations.

LangChain RAG tutorial LlamaIndex RAG guide

Link to this pattern

Advanced RAG with reranking

Adds a reranker after initial retrieval to improve the order and relevance of context passed to the model.

Query

Hybrid or vector retrieval

Candidate passages

Reranking

Context compression

LLM answer

Evaluation

Recommended tools

Cohere Rerank
Jina AI
LangChain
LlamaIndex
Haystack

Advantages

Often improves relevance
Reduces irrelevant context
Useful for production search quality

Limitations

Adds latency and cost
Reranker behavior must be evaluated

Concrete example

A support assistant that retrieves many candidate passages, then reranks them so the most specific troubleshooting step appears first.

Step-by-step build path

Collect failed retrieval examples
Retrieve a larger candidate set
Add a reranker
Compare answer faithfulness
Track latency impact

Use when initial retrieval returns many partially relevant chunks.

Link to this pattern

Hybrid search RAG

Combines keyword and vector retrieval to handle exact names, legal references, technical terms, and semantic queries.

Query

Keyword search

Vector search

Score fusion

Reranking

Prompt assembly

Answer

Recommended tools

Elasticsearch / OpenSearch
Weaviate
Qdrant
Haystack

Advantages

Balances exact and semantic matching
Strong for enterprise search
Handles acronyms and identifiers better

Limitations

Tuning is more complex
Requires careful weighting and evaluation

Concrete example

A legal monitoring system that must match exact regulation numbers and also understand semantic descriptions of obligations.

Step-by-step build path

Index keywords and embeddings
Tune score fusion
Add jurisdiction metadata
Rerank top results
Evaluate exact-reference queries

Use for technical, legal, academic, or enterprise corpora with exact terminology.

Link to this pattern

Metadata-aware RAG

Uses metadata such as date, source, role, department, jurisdiction, or document type to constrain retrieval.

Query classification

Metadata filters

Filtered retrieval

Reranking

Source-aware answer

Recommended tools

Qdrant
Weaviate
Pinecone
Elasticsearch / OpenSearch

Advantages

Improves precision
Supports governance rules
Helps with freshness and access control

Limitations

Metadata quality becomes critical
Requires ingestion discipline

Concrete example

A public-administration assistant that retrieves only current procedures for the user's department and language.

Step-by-step build path

Define metadata schema
Enforce metadata at ingestion
Apply filters before retrieval
Display source authority
Audit access rules

Use when sources differ by authority, time, department, language, or access rights.

Link to this pattern

Agentic RAG

Allows an agent to plan, retrieve, inspect sources, call tools, and iterate before answering.

User task

Planner

Tool selection

Retrieval

Reasoning loop

Answer with sources

Recommended tools

Dify
LangGraph
LangChain
n8n
Flowise

Advantages

Handles multi-step tasks
Can use tools and workflows
Useful for research assistance

Limitations

Harder to test
Higher risk of latency and unpredictable behavior

Concrete example

A research assistant that searches papers, checks definitions, calls a calculator, and synthesizes a sourced answer.

Step-by-step build path

Define allowed tools
Add planning constraints
Log every tool call
Limit retries
Evaluate multi-step failures

Use for tasks that require multiple searches, tool calls, or procedural reasoning.

LangChain documentation Dify documentation

Link to this pattern

Graph RAG

Uses entities and relationships to retrieve connected evidence and support synthesis across a knowledge graph.

Entity extraction

Graph construction

Graph retrieval

Text retrieval

Synthesis

Citation

Recommended tools

Graph databases
LlamaIndex
LangChain
custom pipelines

Advantages

Good for connected knowledge
Supports relationship-aware retrieval
Useful for exploratory analysis

Limitations

Graph construction is demanding
Quality depends on extraction and curation

Concrete example

An institutional knowledge explorer that connects people, projects, policies, departments, and documents.

Step-by-step build path

Extract entities
Curate relationships
Link graph nodes to passages
Retrieve graph neighborhoods
Validate relationship quality

Use for domains with important relationships such as research, policy, legal, or organizational knowledge.

LlamaIndex documentation

Link to this pattern

Multimodal RAG

Retrieves from text, images, tables, scans, diagrams, or media and assembles context for a multimodal or text model.

Files

OCR and parsing

Image or table extraction

Multimodal embeddings

Retrieval

Generation

Recommended tools

Unstructured
Jina AI
LlamaIndex
document AI services

Advantages

Works with real-world document formats
Supports scans and rich media
Useful for archives

Limitations

Parsing quality varies
Evaluation is more complex

Concrete example

An archive assistant that searches scanned PDFs, tables, handwritten forms, and image captions.

Step-by-step build path

Run OCR and layout parsing
Extract tables and figures
Preserve page images
Create text and visual indexes
Evaluate provenance

Use for scanned archives, slide decks, forms, diagrams, and mixed media collections.

LlamaIndex documentation

Link to this pattern

Local/private RAG

Runs models, embeddings, and vector storage locally or in a controlled environment for privacy-sensitive workloads.

Private documents

Local parsing

Local embeddings

Local vector database

Local model

Answer

Recommended tools

Ollama
Open WebUI
AnythingLLM
Chroma
Qdrant

Advantages

Improves data control
Useful for education and sensitive prototypes
Can work without external model APIs

Limitations

Hardware constraints
Model quality varies
Security still needs architecture review

Concrete example

A classroom or lab prototype running local documents, local embeddings, and a local model on controlled hardware.

Step-by-step build path

Install local model runtime
Choose local vector store
Index non-sensitive documents
Measure hardware limits
Review privacy assumptions

Use for privacy-sensitive experiments, classrooms, and offline prototypes.

Link to this pattern

Enterprise RAG with observability and evaluation

Adds governance, access control, monitoring, evaluation, and feedback loops around the RAG pipeline.

Ingestion governance

Access control

Hybrid retrieval

Reranking

Generation

Tracing

Evaluation

Feedback

Recommended tools

Langfuse
Phoenix / Arize
Ragas
TruLens
Elasticsearch / OpenSearch

Advantages

Production visibility
Supports quality management
Helps teams improve over time

Limitations

More moving parts
Requires ownership and review processes

Concrete example

A company-wide assistant with access controls, trace logs, evaluation dashboards, source owners, and release gates.

Step-by-step build path

Define governance owners
Connect identity and permissions
Add tracing
Build test sets
Monitor feedback and regressions

Use when RAG answers affect operations, customers, policy, or regulated decisions.

Langfuse documentation Phoenix documentation Ragas documentation

Link to this pattern