08 RAG Variants
Overview¶
Sam
RAG variants:
-
are: specializations of basic RAG that solve limits of naïve, text-only retrieval.
-
purpose: make the system both flexible and domain aware.
-
Handle non-text data (images, audio, video, tables).
-
Preserve relationships across documents (multi-hop reasoning).
-
Add autonomy through LLM agents (routing, tool use, planning).
-
Improve accuracy, speed, or reasoning in specific contexts.
-
3 most popular RAG variants
- Performs R + G across multiple data types.
- Adds relational reasoning & multi-hop context.
- Adds LLM agents for routing, tool use, and adaptive retrieval.
1. Multimodal RAG¶
Sam
Multimodal RAG
does: enables retrieve + generate from multiple data types.
downsides/challenges:
-
Slower (latency)
-
More expensive (embedding + multimodal LLMs)
-
Possible information loss if you convert images ⟶ text
-
Alignment errors between embeddings spaces
Mechanics: Pipeline Changes¶
Indexing Pipeline¶
-
Loading: Use image/etc loaders. (PIL, Unstructured, Whisper, CSVLoader).
-
Chunking:
-
Text ⟶ normal chunking
-
Audio ⟶ VAD (split on silence)
-
Video ⟶ scene detection
-
Tables ⟶ row/column chunks
-
-
Embeddings (3 strategies)
-
Shared multimodal embeddings ⟶ one vector space for all modalities.
-
Modality-specific embeddings ⟶ CLIP (image-text), CLAP (audio-text).
-
Convert everything to text ⟶ multimodal LLM describes files ⟶ embed as text.
-
-
Storage
-
Embeddings: no change
-
Raw files: Need document store (Redis, etc.).
-
Generation Pipeline¶
-
Retrieval depends on embedding strategy:
-
Shared ⟶ one similarity search.
-
Modality-specific ⟶ multi-vector retrieval across spaces.
-
Converted-to-text ⟶ normal text retrieval + raw files for generation.
-
-
Augmentation includes raw images/audio/video as LLM inputs.
-
Generation uses a multimodal LLM
2. Knowledge-Graph RAG¶
Sam
Graph RAG
purpose: vector search can't answer questions requiring relationships across chunks.
does: adds relational reasoning and context stitching that vectors alone can’t provide.
solves:
-
Multi-hop questions: “Which products are endorsed by the same celebrity?”
-
Theme synthesis: “What are the main themes across all these reports?”
-
Entity-level reasoning: “Link symptoms ⟶ drugs ⟶ dosage interactions.”
challenges:
-
Hard to build/maintain a clean ontology
-
Expensive to generate/maintain (LLM passes)
-
Requires domain constraints (entity types)
Mechanics: 3 Approaches¶
Structure-Aware Retrieval (Simple)
-
Build a hierarchy (parent summaries ⟶ child chunks).
-
Retrieve at the leaf level ⟶ add parents automatically.
-
Implementable w or w/o a graph DB.
Graph-Enhanced Vector Search (Hybrid)
-
Do a normal vector search.
-
Look at the entities inside retrieved chunks.
-
Traverse the graph to get related entities/chunks.
-
Add those to the retrieved set.
This adds multi-hop relevance without replacing vectors.
Community Summaries (Theme-Level Answers)
-
Use graph algorithms (Louvain/Leiden) to find densely connected areas.
-
LLM summarizes each community.
-
Retrieval can fetch summaries directly.
This solves broad queries without enumerating chunks.
Mechanics: Pipeline Changes¶
Indexing
-
Load + chunk (same as RAG).
-
Extract entities + relationships using an LLM.
-
Store in a graph DB (Neo4j).
-
Detect communities, gen summaries.
-
Could store summaries as vectors for hybrid retrieval.
Generation
-
Convert natural query ⟶ graph query (Cypher).
-
Traverse graph to find nodes/relationships.
-
Augment prompt with retrieved graph data.
-
Generate normally.
3. Agentic RAG¶
Sam
Agentic RAG
purpose:
-
makes sure each Q searches the right database(s)
-
enables multi-step reasoning
does: adds an LLM “brain” that makes decisions at every RAG stage.
adds:
-
Query routing ⟶ “This is a code question; search the docs DB only.”
-
Tool use ⟶ search web, call SQL, call APIs.
-
Adaptive retrieval ⟶ agent decides: retrieve more? refine query? switch DB?
-
Iterative generation (
ReAct/IterRetGen) ⟶ revise ⟶ retrieve ⟶ revise again.
challenges
-
Hard to control (agents can misfire)
-
Needs strict safety checks
-
Multi-agent systems multiply errors
-
More compute
-
Workflow complexity increases
Mechanics: Pipeline Changes¶
Indexing | agents can:
-
loading: improve parsing, gen metadata
-
chunking: task-specific chunking via sentiment/entity grouping
-
embedding: dynamically choose embedding models
-
store: route doc chunks ⟶ different collections
Generation | agents can:
-
retrieval: determine which KB, tool reqs, query refinement reqs
-
augmentation: dynamically builds prompts
-
generation: use iterative patterns: Generate ⟶ critique ⟶ retrieve ⟶ regenerate