02 RAG System Design
Sam
RAG system analogy
-
Pipelines (assembly line)
-
#2. System-level components (quality inspector)
-
#3. RAGOps / Layers (electricity in factory)
2. System-level components¶
Sam
-
Caching: Query asked ⟶ LLM responds ⟶ LLM stores in semantic cache. In the future, a similar query ⟶ cached response.
-
Guardrails: Compliance with policies and regulations
-
Security: protect against prompt injection, data poisoning, etc
3. RAGOps / Layers¶
See LLM-RAG/07-RAG-Ops|07-RAG-Ops
Sam
Foundation layers
-
Data ⟶ Process & store data as embeddings
-
Model ⟶ Provider LLMs (base, hosted, fine-tuned)
Intelligence layers
-
Prompt ⟶ Improve prompts (templates, context injection)
-
App orchestration ⟶ Connect components together (retrievers, agents, routing)
Runtime layers
-
Deployment ⟶ Cloud providers for deploying apps (CI/CD, infra-as-code)
-
Application ⟶ Hosting services for apps (APIs, edge/CDN, web hosting)
-
Evaluation ⟶ Provide evaluation metrics (offline tests, retrieval quality)
-
Monitoring ⟶ Monitor RAG apps (latency, drift, costs, alerts)
Other layers: logging and tracing, model versioning, feedback