Intermediate45-90 minRagasTruLensPhoenix / ArizeLangfuse
Evaluate a RAG system
Build a practical evaluation loop for retrieval quality, answer faithfulness, and citations.
Prerequisites
- At least 20 representative questions
- Known authoritative sources
- Access to logs or traces
Step-by-step tutorial
Step 1
Create a question set
Include easy, hard, ambiguous, out-of-scope, and adversarial questions.
- Label expected sources
- Include no-answer cases
- Cover important personas
- Version the set
Step 2
Test retrieval first
Score whether the right evidence appears before evaluating answer style.
- Measure recall
- Inspect top-k noise
- Check filters
- Compare reranking
Step 3
Test generation
Check whether answers are faithful, complete, well cited, and appropriately uncertain.
- Review factual claims
- Check citations
- Flag unsupported synthesis
- Track refusal quality
Next steps
- Automate regression checks
- Add human review
- Track retrieval failures over time