Skip to content

02 RAG System Design

Sam

RAG system analogy

2. System-level components

Sam

  • Caching: Query asked ⟶ LLM responds ⟶ LLM stores in semantic cache. In the future, a similar query ⟶ cached response.

  • Guardrails: Compliance with policies and regulations

  • Security: protect against prompt injection, data poisoning, etc

3. RAGOps / Layers

See LLM-RAG/07-RAG-Ops|07-RAG-Ops

Sam

Foundation layers

  • Data ⟶ Process & store data as embeddings

  • Model ⟶ Provider LLMs (base, hosted, fine-tuned)

Intelligence layers

  • Prompt ⟶ Improve prompts (templates, context injection)

  • App orchestration ⟶ Connect components together (retrievers, agents, routing)

Runtime layers

  • Deployment ⟶ Cloud providers for deploying apps (CI/CD, infra-as-code)

  • Application ⟶ Hosting services for apps (APIs, edge/CDN, web hosting)

  • Evaluation ⟶ Provide evaluation metrics (offline tests, retrieval quality)

  • Monitoring ⟶ Monitor RAG apps (latency, drift, costs, alerts)

Other layers: logging and tracing, model versioning, feedback