Euclidean Distance: Understanding Alternative Vector Metrics
While cosine similarity dominates RAG systems, understanding alternative distance metrics provides insight into vector search trade-offs and helps when cosine similarity isn’t ideal.
What Is Euclidean Distance?
Euclidean distance measures straight-line distance between points in vector space.
Mathematical definition:
d(A, B) = √((A₁-B₁)² + (A₂-B₂)² + ... + (Aₙ-Bₙ)²)
Simplified: d = √(Σ(Aᵢ - Bᵢ)²)Intuitive example (2D):
Point A: (3, 4)Point B: (0, 0)
Distance = √((3-0)² + (4-0)²) = √(9 + 16) = √25 = 5Euclidean Distance vs. Cosine Similarity
Key difference:
Cosine similarity: Measures angle between vectors Formula: (A · B) / (||A|| × ||B||) Range: -1 to 1 Interpretation: Direction matters, magnitude ignored
Euclidean distance: Measures straight-line distance Formula: √(Σ(Aᵢ - Bᵢ)²) Range: 0 to ∞ Interpretation: Absolute position matters, both direction and magnitudeExample showing the difference:
Vector A: [1, 1, 1, 1, 1] (magnitude = √5 ≈ 2.24)Vector B: [2, 2, 2, 2, 2] (magnitude = 2√5 ≈ 4.47)
Cosine similarity: 1.0 (perfectly aligned, same direction)
Euclidean distance: √((1)² × 5) = √5 ≈ 2.24 (they're different points in space)
Interpretation:- Cosine: "Same direction, so very similar"- Euclidean: "Different magnitudes, so distant"
For embedding similarity, cosine's interpretation is usually better.When Euclidean Distance Makes Sense
1. Magnitude-Aware Similarity
When the magnitude of the vector carries meaning.
Example: Embedding of review ratingsDocument A: [very_positive: 0.9, positive: 0.2, neutral: 0.1]Document B: [very_positive: 0.3, positive: 0.2, neutral: 0.1]
Cosine: Similar (both weighted toward very_positive)Euclidean: Different (A is much stronger positive)
Use Euclidean if magnitude difference is meaningful.2. Geometric Clustering
In clustering algorithms (k-means, etc.), Euclidean distance works well.
Vectors naturally cluster in geometric spaceEuclidean distance aligns with cluster membershipCosine similarity less natural for geometric clustering3. Multi-Modal Data
When combining different types of information:
Vector has: [text_embedding: 768d, image_feature: 256d, metadata: 10d]Total: 1034 dimensions
Euclidean distance treats all dimensions equallyCosine similarity might not reflect true similarityComputational Considerations
Speed Comparison
Cosine similarity (normalized vectors): Computation: Just dot product Operations: 768 multiplications + 767 additions Time: ~microseconds per pair
Euclidean distance: Computation: Subtract, square, sum, sqrt Operations: 768 subtractions + 768 squares + 767 additions + 1 sqrt Time: ~slightly slower than cosine
In practice: Negligible difference for most applicationsScale Sensitivity
Euclidean distance is scale-sensitive:
Feature scaling matters dramatically
Raw embeddings: A = [10.0, 20.0, 30.0] B = [10.1, 20.1, 30.1] Distance = √((0.1)² + (0.1)² + (0.1)²) = 0.173
Scaled embeddings (× 1000): A' = [10000, 20000, 30000] B' = [10100, 20100, 30100] Distance = √((100)² + (100)² + (100)²) = 173.2
Same semantic relationship, 1000× different distance!
Solution: Normalize embeddings to unit length before comparisonIf normalized, Euclidean ≈ √(2 - 2×cosine)Euclidean Distance in Vector Databases
Most vector databases support Euclidean distance as an option:
# Pinecone (optional metric)index.query(vector=query_embedding, top_k=5, metric="euclidean")
# Weaviateclient.query.get("Document").with_near_vector({ "vector": query_embedding, "distance": 0.2 # Euclidean distance threshold}).do()
# Elasticsearch{ "knn": { "field": "embedding", "query_vector": query_embedding, "k": 5, "similarity": 1.5, # L2 distance threshold }}
# Milvusres = collection.search( data=[query_embedding], anns_field="embedding", param={"metric_type": "L2"} # Euclidean distance)Manhattan Distance (L1): Another Alternative
Manhattan distance (taxicab distance) sums absolute differences:
d(A, B) = |A₁-B₁| + |A₂-B₂| + ... + |Aₙ-Bₙ|
Manhattan: d = Σ|Aᵢ - Bᵢ|Euclidean: d = √(Σ(Aᵢ - Bᵢ)²)Comparison:
A = [3, 4], B = [0, 0]
Manhattan: |3| + |4| = 7Euclidean: √(9 + 16) = 5
Manhattan penalizes large deviations more heavilyEuclidean spreads penalty across dimensionsWhen to use Manhattan:
- Sparse vectors (faster)
- City-block distance interpretable
- Less common for embeddings
For embeddings:
Performance comparison on text retrieval:Cosine: nDCG@10 = 0.65Euclidean: nDCG@10 = 0.62Manhattan: nDCG@10 = 0.58Cosine consistently outperforms alternatives.
Practical Comparison: Cosine vs. Euclidean for RAG
import numpy as npfrom sklearn.metrics.pairwise import cosine_similarity, euclidean_distances
# Sample embeddings (simplified)query = np.array([0.1, 0.8, -0.2, 0.15])doc1 = np.array([0.1, 0.8, -0.2, 0.15]) # Identicaldoc2 = np.array([0.2, 0.7, -0.1, 0.25]) # Similardoc3 = np.array([0.5, 0.1, 0.5, -0.3]) # Dissimilar
# Cosine similaritycos_scores = [ cosine_similarity([query], [doc])[0][0] for doc in [doc1, doc2, doc3]]print("Cosine:", cos_scores)# Output: [1.0, 0.995, 0.32]
# Euclidean distance (smaller = more similar)euc_distances = [ euclidean_distances([query], [doc])[0][0] for doc in [doc1, doc2, doc3]]euc_scores = 1 / (1 + euc_distances) # Convert to similarity-like scoreprint("Euclidean similarity:", euc_scores)# Output: [1.0, 0.992, 0.28]
# Rankingscos_ranking = np.argsort(-np.array(cos_scores))euc_ranking = np.argsort(-np.array(euc_scores))print("Cosine ranking:", cos_ranking)print("Euclidean ranking:", euc_ranking)# Both: [0, 1, 2] - same ranking for this exampleTrade-offs Summary
| Aspect | Cosine | Euclidean |
|---|---|---|
| Embeddings | ✓ Preferred | ✗ Less common |
| Magnitude | Ignored | Considered |
| Paraphrases | Excellent | Good |
| Scale-sensitive | No | Yes |
| Interpretability | Clear | Less clear |
| Computational cost | Minimal | Minimal |
| Theory support | Excellent | Good |
| Industry standard | Nearly universal | Occasional |
Recommendation for RAG Systems
Use cosine similarity by default:
✓ Normalized embeddings✓ Proven empirically✓ Industry standard✓ Magnitude-invariant (capitalization doesn't matter)✓ Efficient and simpleConsider Euclidean only if:
- Magnitude carries meaning (rare in text embeddings)- You're clustering rather than ranking- Specific domain research suggests it- Empirical testing shows better resultsAdvanced: Mixing Metrics
Hybrid approaches using multiple metrics:
def advanced_similarity(query_embedding, doc_embedding): cosine = cosine_similarity([query_embedding], [doc_embedding])[0][0] euclidean = 1 / (1 + euclidean_distances([query_embedding], [doc_embedding])[0][0])
# Weighted combination combined = 0.7 * cosine + 0.3 * euclidean return combinedSome systems use multiple metrics for redundancy and robustness, but added complexity usually isn’t worth it.
Conclusion
Cosine similarity is the right choice for RAG systems with text embeddings. Euclidean distance and other metrics have niche uses but don’t offer advantages for typical semantic search scenarios. Understand the alternatives, but implement with cosine similarity unless you have specific reasons otherwise.