Skip to main content

Architectures

Practical RAG architecture patterns

Use these patterns as starting points, then validate retrieval quality, security, cost, and user outcomes for your domain.

Architecture reading method

  • Start with the simplest pattern that can solve the use case.
  • Add hybrid search when exact terms matter.
  • Add metadata filters when authority, language, date, or permissions matter.
  • Add reranking when top results are noisy.
  • Add observability before production release.

Basic RAG

A straightforward pipeline that retrieves relevant chunks and passes them to an LLM with the user question.

Documents
Ingestion
Chunking
Embeddings
Vector database
Retrieval
Prompt assembly
LLM answer
Citations

Recommended tools

  • Dify
  • LlamaIndex
  • LangChain
  • Chroma
  • Qdrant

Advantages

  • Easy to explain
  • Good first prototype
  • Works for many document Q&A tasks

Limitations

  • May miss exact keyword matches
  • Can retrieve noisy chunks
  • Needs evaluation before production

Concrete example

A course assistant that answers only from lecture notes, syllabus documents, and reading lists.

Step-by-step build path

  1. Select trusted course files
  2. Chunk by headings or lessons
  3. Embed chunks
  4. Retrieve top passages
  5. Generate answer with citations
  6. Review common student questions

Use for first prototypes, small knowledge bases, and educational demonstrations.

Link to this pattern

Advanced RAG with reranking

Adds a reranker after initial retrieval to improve the order and relevance of context passed to the model.

Query
Hybrid or vector retrieval
Candidate passages
Reranking
Context compression
LLM answer
Evaluation

Recommended tools

  • Cohere Rerank
  • Jina AI
  • LangChain
  • LlamaIndex
  • Haystack

Advantages

  • Often improves relevance
  • Reduces irrelevant context
  • Useful for production search quality

Limitations

  • Adds latency and cost
  • Reranker behavior must be evaluated

Concrete example

A support assistant that retrieves many candidate passages, then reranks them so the most specific troubleshooting step appears first.

Step-by-step build path

  1. Collect failed retrieval examples
  2. Retrieve a larger candidate set
  3. Add a reranker
  4. Compare answer faithfulness
  5. Track latency impact

Use when initial retrieval returns many partially relevant chunks.

Link to this pattern

Hybrid search RAG

Combines keyword and vector retrieval to handle exact names, legal references, technical terms, and semantic queries.

Query
Keyword search
Vector search
Score fusion
Reranking
Prompt assembly
Answer

Recommended tools

  • Elasticsearch / OpenSearch
  • Weaviate
  • Qdrant
  • Haystack

Advantages

  • Balances exact and semantic matching
  • Strong for enterprise search
  • Handles acronyms and identifiers better

Limitations

  • Tuning is more complex
  • Requires careful weighting and evaluation

Concrete example

A legal monitoring system that must match exact regulation numbers and also understand semantic descriptions of obligations.

Step-by-step build path

  1. Index keywords and embeddings
  2. Tune score fusion
  3. Add jurisdiction metadata
  4. Rerank top results
  5. Evaluate exact-reference queries

Use for technical, legal, academic, or enterprise corpora with exact terminology.

Link to this pattern

Metadata-aware RAG

Uses metadata such as date, source, role, department, jurisdiction, or document type to constrain retrieval.

Query classification
Metadata filters
Filtered retrieval
Reranking
Source-aware answer

Recommended tools

  • Qdrant
  • Weaviate
  • Pinecone
  • Elasticsearch / OpenSearch

Advantages

  • Improves precision
  • Supports governance rules
  • Helps with freshness and access control

Limitations

  • Metadata quality becomes critical
  • Requires ingestion discipline

Concrete example

A public-administration assistant that retrieves only current procedures for the user's department and language.

Step-by-step build path

  1. Define metadata schema
  2. Enforce metadata at ingestion
  3. Apply filters before retrieval
  4. Display source authority
  5. Audit access rules

Use when sources differ by authority, time, department, language, or access rights.

Link to this pattern

Agentic RAG

Allows an agent to plan, retrieve, inspect sources, call tools, and iterate before answering.

User task
Planner
Tool selection
Retrieval
Reasoning loop
Answer with sources

Recommended tools

  • Dify
  • LangGraph
  • LangChain
  • n8n
  • Flowise

Advantages

  • Handles multi-step tasks
  • Can use tools and workflows
  • Useful for research assistance

Limitations

  • Harder to test
  • Higher risk of latency and unpredictable behavior

Concrete example

A research assistant that searches papers, checks definitions, calls a calculator, and synthesizes a sourced answer.

Step-by-step build path

  1. Define allowed tools
  2. Add planning constraints
  3. Log every tool call
  4. Limit retries
  5. Evaluate multi-step failures

Use for tasks that require multiple searches, tool calls, or procedural reasoning.

Link to this pattern

Graph RAG

Uses entities and relationships to retrieve connected evidence and support synthesis across a knowledge graph.

Entity extraction
Graph construction
Graph retrieval
Text retrieval
Synthesis
Citation

Recommended tools

  • Graph databases
  • LlamaIndex
  • LangChain
  • custom pipelines

Advantages

  • Good for connected knowledge
  • Supports relationship-aware retrieval
  • Useful for exploratory analysis

Limitations

  • Graph construction is demanding
  • Quality depends on extraction and curation

Concrete example

An institutional knowledge explorer that connects people, projects, policies, departments, and documents.

Step-by-step build path

  1. Extract entities
  2. Curate relationships
  3. Link graph nodes to passages
  4. Retrieve graph neighborhoods
  5. Validate relationship quality

Use for domains with important relationships such as research, policy, legal, or organizational knowledge.

Link to this pattern

Multimodal RAG

Retrieves from text, images, tables, scans, diagrams, or media and assembles context for a multimodal or text model.

Files
OCR and parsing
Image or table extraction
Multimodal embeddings
Retrieval
Generation

Recommended tools

  • Unstructured
  • Jina AI
  • LlamaIndex
  • document AI services

Advantages

  • Works with real-world document formats
  • Supports scans and rich media
  • Useful for archives

Limitations

  • Parsing quality varies
  • Evaluation is more complex

Concrete example

An archive assistant that searches scanned PDFs, tables, handwritten forms, and image captions.

Step-by-step build path

  1. Run OCR and layout parsing
  2. Extract tables and figures
  3. Preserve page images
  4. Create text and visual indexes
  5. Evaluate provenance

Use for scanned archives, slide decks, forms, diagrams, and mixed media collections.

Link to this pattern

Local/private RAG

Runs models, embeddings, and vector storage locally or in a controlled environment for privacy-sensitive workloads.

Private documents
Local parsing
Local embeddings
Local vector database
Local model
Answer

Recommended tools

  • Ollama
  • Open WebUI
  • AnythingLLM
  • Chroma
  • Qdrant

Advantages

  • Improves data control
  • Useful for education and sensitive prototypes
  • Can work without external model APIs

Limitations

  • Hardware constraints
  • Model quality varies
  • Security still needs architecture review

Concrete example

A classroom or lab prototype running local documents, local embeddings, and a local model on controlled hardware.

Step-by-step build path

  1. Install local model runtime
  2. Choose local vector store
  3. Index non-sensitive documents
  4. Measure hardware limits
  5. Review privacy assumptions

Use for privacy-sensitive experiments, classrooms, and offline prototypes.

Link to this pattern

Enterprise RAG with observability and evaluation

Adds governance, access control, monitoring, evaluation, and feedback loops around the RAG pipeline.

Ingestion governance
Access control
Hybrid retrieval
Reranking
Generation
Tracing
Evaluation
Feedback

Recommended tools

  • Langfuse
  • Phoenix / Arize
  • Ragas
  • TruLens
  • Elasticsearch / OpenSearch

Advantages

  • Production visibility
  • Supports quality management
  • Helps teams improve over time

Limitations

  • More moving parts
  • Requires ownership and review processes

Concrete example

A company-wide assistant with access controls, trace logs, evaluation dashboards, source owners, and release gates.

Step-by-step build path

  1. Define governance owners
  2. Connect identity and permissions
  3. Add tracing
  4. Build test sets
  5. Monitor feedback and regressions

Use when RAG answers affect operations, customers, policy, or regulated decisions.

Link to this pattern