Skip to content

02 RAG System Design

02 RAG System Design

Sam

RAG system analogy

Pipelines (assembly line)
#2. System-level components (quality inspector)
#3. RAGOps / Layers (electricity in factory)

2. System-level components¶

Sam

Caching: Query asked ⟶ LLM responds ⟶ LLM stores in semantic cache. In the future, a similar query ⟶ cached response.
Guardrails: Compliance with policies and regulations
Security: protect against prompt injection, data poisoning, etc

3. RAGOps / Layers¶

See LLM-RAG/07-RAG-Ops|07-RAG-Ops

Sam

Foundation layers

Data ⟶ Process & store data as embeddings
Model ⟶ Provider LLMs (base, hosted, fine-tuned)

Intelligence layers

Prompt ⟶ Improve prompts (templates, context injection)
App orchestration ⟶ Connect components together (retrievers, agents, routing)

Runtime layers

Deployment ⟶ Cloud providers for deploying apps (CI/CD, infra-as-code)
Application ⟶ Hosting services for apps (APIs, edge/CDN, web hosting)
Evaluation ⟶ Provide evaluation metrics (offline tests, retrieval quality)
Monitoring ⟶ Monitor RAG apps (latency, drift, costs, alerts)

Other layers: logging and tracing, model versioning, feedback