08 RAG Variants

Overview¶

Sam

RAG variants:

are: specializations of basic RAG that solve limits of naïve, text-only retrieval.
purpose: make the system both flexible and domain aware.
- Handle non-text data (images, audio, video, tables).
- Preserve relationships across documents (multi-hop reasoning).
- Add autonomy through LLM agents (routing, tool use, planning).
- Improve accuracy, speed, or reasoning in specific contexts.

3 most popular RAG variants

#1. Multimodal RAG

Performs R + G across multiple data types.

#2. Knowledge-Graph RAG

Adds relational reasoning & multi-hop context.

#3. Agentic RAG

Adds LLM agents for routing, tool use, and adaptive retrieval.

1. Multimodal RAG¶

Sam

Multimodal RAG

does: enables retrieve + generate from multiple data types.

downsides/challenges:

Slower (latency)
More expensive (embedding + multimodal LLMs)
Possible information loss if you convert images ⟶ text
Alignment errors between embeddings spaces

Mechanics: Pipeline Changes¶

Indexing Pipeline¶

Loading: Use image/etc loaders. (PIL, Unstructured, Whisper, CSVLoader).
Chunking:
- Text ⟶ normal chunking
- Audio ⟶ VAD (split on silence)
- Video ⟶ scene detection
- Tables ⟶ row/column chunks
Embeddings (3 strategies)
1. Shared multimodal embeddings ⟶ one vector space for all modalities.
2. Modality-specific embeddings ⟶ CLIP (image-text), CLAP (audio-text).
3. Convert everything to text ⟶ multimodal LLM describes files ⟶ embed as text.
Storage
- Embeddings: no change
- Raw files: Need document store (Redis, etc.).

Generation Pipeline¶

Retrieval depends on embedding strategy:
- Shared ⟶ one similarity search.
- Modality-specific ⟶ multi-vector retrieval across spaces.
- Converted-to-text ⟶ normal text retrieval + raw files for generation.
Augmentation includes raw images/audio/video as LLM inputs.
Generation uses a multimodal LLM

2. Knowledge-Graph RAG¶

Sam

Graph RAG

purpose: vector search can't answer questions requiring relationships across chunks.

does: adds relational reasoning and context stitching that vectors alone can’t provide.

solves:

Multi-hop questions: “Which products are endorsed by the same celebrity?”
Theme synthesis: “What are the main themes across all these reports?”
Entity-level reasoning: “Link symptoms ⟶ drugs ⟶ dosage interactions.”

challenges:

Hard to build/maintain a clean ontology
Expensive to generate/maintain (LLM passes)
Requires domain constraints (entity types)

Mechanics: 3 Approaches¶

Structure-Aware Retrieval (Simple)

Build a hierarchy (parent summaries ⟶ child chunks).
Retrieve at the leaf level ⟶ add parents automatically.
Implementable w or w/o a graph DB.

Graph-Enhanced Vector Search (Hybrid)

Do a normal vector search.
Look at the entities inside retrieved chunks.
Traverse the graph to get related entities/chunks.
Add those to the retrieved set.

This adds multi-hop relevance without replacing vectors.

Community Summaries (Theme-Level Answers)

Use graph algorithms (Louvain/Leiden) to find densely connected areas.
LLM summarizes each community.
Retrieval can fetch summaries directly.

This solves broad queries without enumerating chunks.

Mechanics: Pipeline Changes¶

Indexing

Load + chunk (same as RAG).
Extract entities + relationships using an LLM.
Store in a graph DB (Neo4j).
Detect communities, gen summaries.
Could store summaries as vectors for hybrid retrieval.

Generation

Convert natural query ⟶ graph query (Cypher).
Traverse graph to find nodes/relationships.
Augment prompt with retrieved graph data.
Generate normally.

3. Agentic RAG¶

Sam

Agentic RAG

purpose:

makes sure each Q searches the right database(s)
enables multi-step reasoning

does: adds an LLM “brain” that makes decisions at every RAG stage.

adds:

Query routing ⟶ “This is a code question; search the docs DB only.”
Tool use ⟶ search web, call SQL, call APIs.
Adaptive retrieval ⟶ agent decides: retrieve more? refine query? switch DB?
Iterative generation (ReAct / IterRetGen) ⟶ revise ⟶ retrieve ⟶ revise again.

challenges

Hard to control (agents can misfire)
Needs strict safety checks
Multi-agent systems multiply errors
More compute
Workflow complexity increases

Mechanics: Pipeline Changes¶

Indexing | agents can:

loading: improve parsing, gen metadata
chunking: task-specific chunking via sentiment/entity grouping
embedding: dynamically choose embedding models
store: route doc chunks ⟶ different collections

Generation | agents can:

retrieval: determine which KB, tool reqs, query refinement reqs
augmentation: dynamically builds prompts
generation: use iterative patterns: Generate ⟶ critique ⟶ retrieve ⟶ regenerate