Vector Similarity in RAG: Finding Semantically Related Content

Understand vector similarity and how RAG systems find semantically related content. Learn similarity metrics and retrieval mechanisms.

Vector Similarity: The Engine of Semantic Retrieval

Once your documents are embedded as vectors, the core retrieval task becomes: given a query vector, find the document vectors most similar to it. This seemingly simple task—finding similar vectors—powers semantic search in RAG systems.

The Geometry of Meaning

Embeddings place texts in a high-dimensional vector space where geometry encodes meaning:

  • Similar texts have vectors pointing in similar directions
  • Dissimilar texts point in different directions
  • The distance or angle between vectors measures semantic similarity

Intuitive example (simplified to 2D):

Dimension 2
│ "cat"
│ •
│ /
│ / similarity
│ /
──────────•────────→ Dimension 1
/│ "dog"
/ │ •
/ │ /
/ │ /
"table" │/
• │

“Cat” and “dog” are nearby (both animals), while “table” is distant.

Similarity Metrics: The Math

1. Cosine Similarity

Measures the angle between vectors. Most popular for embeddings.

Formula:

similarity = (A · B) / (||A|| × ||B||)

Characteristics:

  • Range: -1 to +1 (typically 0 to 1 for normalized embeddings)
  • Interpretation: 1.0 = identical direction, 0.0 = orthogonal, -1.0 = opposite
  • Why it’s popular: Invariant to vector magnitude, computationally efficient

Intuition: Two sentences with identical content get similarity 1.0, even if one is typed in ALL CAPS (affecting magnitude but not direction).

Example:

from sklearn.metrics.pairwise import cosine_similarity
query_embedding = [0.5, 0.8, -0.2, ...]
document_embedding = [0.51, 0.79, -0.19, ...]
similarity = cosine_similarity([query_embedding], [document_embedding])
# Returns: [[0.998]] (very similar)

2. Euclidean Distance

Measures straight-line distance between vectors.

Formula:

distance = sqrt((A₁-B₁)² + (A₂-B₂)² + ... + (Aₙ-Bₙ)²)

Characteristics:

  • Range: 0 to infinity
  • Interpretation: 0 = identical, larger = more different
  • Sensitive to vector magnitude (important distinction)

When to use: When absolute differences matter, less common in embedding similarity.

Example:

from scipy.spatial.distance import euclidean
distance = euclidean(query_embedding, document_embedding)
# Returns: 0.05 (small distance = similar)

3. Dot Product Similarity

Simple multiplication of corresponding dimensions.

Formula:

similarity = A · B = Σ(Aᵢ × Bᵢ)

Characteristics:

  • Range: unbounded (depends on vector magnitude)
  • Fast computation
  • Assumes normalized vectors for meaningful results
  • Used in some production systems for speed

When to use: When vectors are normalized and speed is critical.

4. Manhattan Distance (L1 Distance)

Sum of absolute differences.

Formula:

distance = |A₁-B₁| + |A₂-B₂| + ... + |Aₙ-Bₙ|

Characteristics:

  • Used less often than Euclidean or cosine
  • Can be useful for sparse vectors
  • Faster than Euclidean in some implementations

Why Cosine Similarity Dominates

Cosine similarity is the standard for embeddings. Why?

1. Normalization invariance: A document about cats embedded as 10x the magnitude still has cosine similarity 1.0 with the original. Magnitude doesn’t distract from meaning.

2. Computational efficiency: Simple vector operations, scales well.

3. Intuitive interpretation: 0.9 similarity means “90% directionally aligned” in semantic space.

4. Empirically proven: Decades of information retrieval research shows it works.

5. Compatible with indexing: Efficient indexing structures (see HNSW, IVF) built around cosine similarity.

The Retrieval Process

When you query your RAG system:

Step 1: Encode query to embedding (same model as documents)

Step 2: Compute similarity to all document embeddings

Step 3: Return top K documents by similarity score

Step 4: Rerank (optional) using cross-encoders or other methods

For a 1M document database with 768-dimensional embeddings:

Query embedding: 768 dimensions
Document embeddings: 1M × 768 = 768M values in memory/index
Similarity computation: ~768M operations
Time: ~100-500ms on modern hardware (with indexing optimization)

Without indexing, retrieving from 1M documents would take seconds. Indexing brings it down to milliseconds.

Semantic Similarity vs. Lexical Similarity

Lexical similarity: Match keywords

Query: "Does RAG reduce hallucinations?"
Lexical match: Documents containing "RAG", "hallucination", etc.
Problem: Misses "Does retrieval-augmented generation improve accuracy?" (same meaning, different words)

Semantic similarity: Match meaning

Query: "Does RAG reduce hallucinations?"
Semantic match: Documents about avoiding false claims, using sources, factual grounding
Benefit: Captures paraphrases and synonyms

RAG systems use semantic similarity because it better captures meaning.

1. Curse of Dimensionality

In very high dimensions (thousands), all points become roughly equidistant. This degrades similarity ranking.

Mitigation:

  • Use dimension reduction (but rarely needed with good embeddings)
  • Use indexing structures optimized for high dimensions
  • Validate that your embedding model works for your use case

2. Anisotropy

Some embedding models have anisotropic distributions where vectors cluster in certain directions, causing false similarities.

Mitigation:

  • Use modern embedding models (text-embedding-3-large, BGE) which are designed to be isotropic
  • Apply whitening or normalization if needed

3. Query-Document Asymmetry

A query might need different representation than document chunks.

Approaches:

  • Use query-specific embeddings (different model for queries)
  • Use hypothetical document embeddings (query suggests what document would exist)
  • Dense passage retrieval models (asymmetric encoders)

4. Cross-Lingual Similarity

Finding similar documents across languages is harder than within language.

Solution: Multilingual embeddings (Cohere multilingual, multilingual Sentence-Transformers) trained on parallel texts.

Measuring Retrieval Quality

Test your similarity function empirically:

Metric 1: Hit Rate@K

For each test query with known relevant documents:
Did any known relevant document appear in top K results?
Average across all queries.

Metric 2: Mean Reciprocal Rank

If first relevant document is at position 3: score = 1/3
Average across all queries

Metric 3: NDCG (Normalized Discounted Cumulative Gain)

Ranks matter: position 1 result is more valuable than position 5
Normalized against perfect ranking

Similarity Beyond Documents

Modern RAG extends similarity to:

  • Multi-modal similarity: Text to images
  • Structured similarity: Matching structured database records
  • Temporal similarity: Incorporating recency
  • Causal similarity: Understanding relationships, not just surface similarity

Production Deployment

In production systems, similarity computation is optimized:

Memory optimization:

  • Store embeddings in compact formats (float16, int8 quantization)
  • Reduce dimensions if needed

Speed optimization:

  • Use FAISS, Annoy, or other approximate nearest neighbor indexes
  • Cache popular queries
  • Batch similarity computations

Freshness:

  • Periodically recompute similarities as vectors change
  • Update indexes incrementally

The similarity function is invisible to users but fundamental to RAG quality. Understanding it helps you debug retrieval problems and make informed system design choices.