Architectures
Practical RAG architecture patterns
Use these patterns as starting points, then validate retrieval quality, security, cost, and user outcomes for your domain.
Basic RAG
A straightforward pipeline that retrieves relevant chunks and passes them to an LLM with the user question.
Recommended tools
- Dify
- LlamaIndex
- LangChain
- Chroma
- Qdrant
Advantages
- Easy to explain
- Good first prototype
- Works for many document Q&A tasks
Limitations
- May miss exact keyword matches
- Can retrieve noisy chunks
- Needs evaluation before production
Concrete example
A course assistant that answers only from lecture notes, syllabus documents, and reading lists.
Step-by-step build path
- Select trusted course files
- Chunk by headings or lessons
- Embed chunks
- Retrieve top passages
- Generate answer with citations
- Review common student questions
Use for first prototypes, small knowledge bases, and educational demonstrations.
Link to this patternAdvanced RAG with reranking
Adds a reranker after initial retrieval to improve the order and relevance of context passed to the model.
Hybrid or vector retrieval Evaluation
Recommended tools
- Cohere Rerank
- Jina AI
- LangChain
- LlamaIndex
- Haystack
Advantages
- Often improves relevance
- Reduces irrelevant context
- Useful for production search quality
Limitations
- Adds latency and cost
- Reranker behavior must be evaluated
Concrete example
A support assistant that retrieves many candidate passages, then reranks them so the most specific troubleshooting step appears first.
Step-by-step build path
- Collect failed retrieval examples
- Retrieve a larger candidate set
- Add a reranker
- Compare answer faithfulness
- Track latency impact
Use when initial retrieval returns many partially relevant chunks.
Link to this patternHybrid search RAG
Combines keyword and vector retrieval to handle exact names, legal references, technical terms, and semantic queries.
Recommended tools
- Elasticsearch / OpenSearch
- Weaviate
- Qdrant
- Haystack
Advantages
- Balances exact and semantic matching
- Strong for enterprise search
- Handles acronyms and identifiers better
Limitations
- Tuning is more complex
- Requires careful weighting and evaluation
Concrete example
A legal monitoring system that must match exact regulation numbers and also understand semantic descriptions of obligations.
Step-by-step build path
- Index keywords and embeddings
- Tune score fusion
- Add jurisdiction metadata
- Rerank top results
- Evaluate exact-reference queries
Use for technical, legal, academic, or enterprise corpora with exact terminology.
Link to this patternMetadata-aware RAG
Uses metadata such as date, source, role, department, jurisdiction, or document type to constrain retrieval.
Recommended tools
- Qdrant
- Weaviate
- Pinecone
- Elasticsearch / OpenSearch
Advantages
- Improves precision
- Supports governance rules
- Helps with freshness and access control
Limitations
- Metadata quality becomes critical
- Requires ingestion discipline
Concrete example
A public-administration assistant that retrieves only current procedures for the user's department and language.
Step-by-step build path
- Define metadata schema
- Enforce metadata at ingestion
- Apply filters before retrieval
- Display source authority
- Audit access rules
Use when sources differ by authority, time, department, language, or access rights.
Link to this patternAgentic RAG
Allows an agent to plan, retrieve, inspect sources, call tools, and iterate before answering.
Recommended tools
- Dify
- LangGraph
- LangChain
- n8n
- Flowise
Advantages
- Handles multi-step tasks
- Can use tools and workflows
- Useful for research assistance
Limitations
- Harder to test
- Higher risk of latency and unpredictable behavior
Concrete example
A research assistant that searches papers, checks definitions, calls a calculator, and synthesizes a sourced answer.
Step-by-step build path
- Define allowed tools
- Add planning constraints
- Log every tool call
- Limit retries
- Evaluate multi-step failures
Use for tasks that require multiple searches, tool calls, or procedural reasoning.
Link to this patternGraph RAG
Uses entities and relationships to retrieve connected evidence and support synthesis across a knowledge graph.
Recommended tools
- Graph databases
- LlamaIndex
- LangChain
- custom pipelines
Advantages
- Good for connected knowledge
- Supports relationship-aware retrieval
- Useful for exploratory analysis
Limitations
- Graph construction is demanding
- Quality depends on extraction and curation
Concrete example
An institutional knowledge explorer that connects people, projects, policies, departments, and documents.
Step-by-step build path
- Extract entities
- Curate relationships
- Link graph nodes to passages
- Retrieve graph neighborhoods
- Validate relationship quality
Use for domains with important relationships such as research, policy, legal, or organizational knowledge.
Link to this patternMultimodal RAG
Retrieves from text, images, tables, scans, diagrams, or media and assembles context for a multimodal or text model.
Image or table extraction Generation
Recommended tools
- Unstructured
- Jina AI
- LlamaIndex
- document AI services
Advantages
- Works with real-world document formats
- Supports scans and rich media
- Useful for archives
Limitations
- Parsing quality varies
- Evaluation is more complex
Concrete example
An archive assistant that searches scanned PDFs, tables, handwritten forms, and image captions.
Step-by-step build path
- Run OCR and layout parsing
- Extract tables and figures
- Preserve page images
- Create text and visual indexes
- Evaluate provenance
Use for scanned archives, slide decks, forms, diagrams, and mixed media collections.
Link to this patternLocal/private RAG
Runs models, embeddings, and vector storage locally or in a controlled environment for privacy-sensitive workloads.
Recommended tools
- Ollama
- Open WebUI
- AnythingLLM
- Chroma
- Qdrant
Advantages
- Improves data control
- Useful for education and sensitive prototypes
- Can work without external model APIs
Limitations
- Hardware constraints
- Model quality varies
- Security still needs architecture review
Concrete example
A classroom or lab prototype running local documents, local embeddings, and a local model on controlled hardware.
Step-by-step build path
- Install local model runtime
- Choose local vector store
- Index non-sensitive documents
- Measure hardware limits
- Review privacy assumptions
Use for privacy-sensitive experiments, classrooms, and offline prototypes.
Link to this patternEnterprise RAG with observability and evaluation
Adds governance, access control, monitoring, evaluation, and feedback loops around the RAG pipeline.
Recommended tools
- Langfuse
- Phoenix / Arize
- Ragas
- TruLens
- Elasticsearch / OpenSearch
Advantages
- Production visibility
- Supports quality management
- Helps teams improve over time
Limitations
- More moving parts
- Requires ownership and review processes
Concrete example
A company-wide assistant with access controls, trace logs, evaluation dashboards, source owners, and release gates.
Step-by-step build path
- Define governance owners
- Connect identity and permissions
- Add tracing
- Build test sets
- Monitor feedback and regressions
Use when RAG answers affect operations, customers, policy, or regulated decisions.
Link to this pattern