Vector Databases
You’ve created your embeddings. Now you need to store millions of them and find the most similar ones to a query vector in milliseconds. That’s what vector databases do. They’re a specialized data infrastructure layer that makes semantic search practical at scale.
Why Regular Databases Don’t Work
Standard databases optimize for exact lookups (give me the row with id=42) or range queries (give me all rows where age > 30). They’re terrible at “give me the 10 most similar vectors to this query.”
A brute-force similarity search over 10 million vectors would compute 10 million cosine similarities per query — that’s roughly 1–2 seconds on modern hardware, far too slow for interactive applications. Vector databases use specialized index structures to make this fast.
Approximate Nearest Neighbor Search
The key insight: for most applications, finding the exact top-10 most similar vectors is unnecessary. Finding vectors that are approximately most similar (recall@10 > 90%) is fast enough and the quality difference is negligible.
This is Approximate Nearest Neighbor (ANN) search, and it’s what every vector database uses internally.
Exact NN: 100% recall, O(N) per query → 1000ms for 10M vectorsANN (HNSW): 99% recall, O(log N) per query → 5ms for 10M vectorsHNSW: The Dominant Index Structure
Hierarchical Navigable Small World (HNSW) graphs are the standard index in most production vector databases. The intuition:
Layer 2 (long hops): ●──────────────────● │Layer 1 (medium): ●────●────●────●───● │Layer 0 (fine): ●──●──●──●──●──●──●──● (all vectors)
Search: Start at top layer (few nodes, long range) Navigate towards query Drop to lower layer when closer Repeat until layer 0 → return top-KHNSW achieves near-logarithmic query time with excellent recall (~99% in practice). It’s used by Weaviate, Qdrant, Milvus, and Chroma by default.
Key tradeoffs in HNSW:
ef_construction(higher → better recall, slower indexing, more memory)m(connections per node, higher → better recall, more memory)efat query time (higher → better recall, slower queries)
IVF: For Very Large Scale
Inverted File Index (IVF) partitions the vector space into clusters (using k-means). At query time, only the closest clusters are searched.
IVF workflow: Train: Cluster all vectors into K centroids (K=1000 typical) Index: Assign each vector to its nearest centroid
Query: Find top nprobe centroids near query (e.g., nprobe=10) Search only vectors in those K clusters
Result: ~100× faster than brute force for large datasets with ~95-98% recall at nprobe=10Used extensively in Faiss (Facebook AI Similarity Search) and as an option in most managed databases. Preferred for datasets >100M vectors.
Major Vector Database Options (2026)
Managed / Cloud-Native
Pinecone
- Fully managed, no infrastructure to run
- Excellent performance, simple API
- Serverless tier (pay per query) good for variable workloads
- No local deployment option (data leaves your infrastructure)
Weaviate Cloud
- Open-source core, managed cloud option
- Strong metadata filtering + vector search combination
- Built-in text and multimodal vectorization
- GraphQL query interface
Qdrant Cloud
- Fast, Rust-based, excellent performance benchmarks
- Open-source + managed cloud
- Strong filtering capabilities, sparse vector support (for hybrid)
- Native support for named vectors (multiple embedding spaces)
Self-Hosted / Open-Source
Qdrant (self-hosted)
- Docker deployment, excellent docs
- Best choice for teams wanting full control + high performance
Weaviate (self-hosted)
- Richer feature set (modules, GraphQL)
- More complex to operate
Chroma
- Developer-friendly, embeds in Python processes
- Great for prototyping and small-scale production
- Not designed for multi-billion vector scale
Milvus / Zilliz
- Extremely scalable (designed for 1B+ vectors)
- More complex to operate; Zilliz offers managed version
Postgres Extensions
pgvector
- Vector search directly in PostgreSQL
- Dramatically simpler stack if you’re already on Postgres
- Performance lags purpose-built vector DBs at large scale
- pgvector 0.7+ added HNSW support, major improvement
-- pgvector exampleCREATE TABLE documents ( id SERIAL PRIMARY KEY, content TEXT, metadata JSONB, embedding vector(1536));
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
-- Semantic searchSELECT content, 1 - (embedding <=> $1::vector) as similarityFROM documentsORDER BY embedding <=> $1::vectorLIMIT 10;Choosing the Right Vector Database
Start with Postgres + pgvector if: ✓ You're already on Postgres ✓ Under ~5M vectors ✓ Team prefers minimal new infrastructure
Use Qdrant (self-hosted) if: ✓ You need performance + control + open source ✓ Compliance/data locality requirements ✓ Hybrid search (dense + sparse) is needed
Use Pinecone if: ✓ Team wants zero infrastructure management ✓ Variable traffic (serverless pricing) ✓ Getting started fast matters more than cost at scale
Use Milvus/Zilliz if: ✓ 100M+ vectors ✓ Enterprise scale with dedicated ops teamMetadata Filtering
Pure vector search often isn’t enough. You also need to filter by metadata:
# Qdrant: hybrid filter + vector searchresults = client.search( collection_name="documents", query_vector=query_embedding, query_filter={ "must": [ {"key": "department", "match": {"value": "engineering"}}, {"key": "created_after", "range": {"gte": "2024-01-01"}} ] }, limit=10)This is called filtered vector search — restrict the ANN search to the subset of vectors matching the filter. Most databases handle this efficiently without a full scan.
Hybrid Search: Combining Dense and Sparse Vectors
Many modern vector databases support storing both dense embeddings and sparse BM25/SPLADE vectors, enabling native hybrid search:
# Qdrant hybrid searchfrom qdrant_client.models import SparseVector, NamedSparseVector
results = client.query_points( collection_name="documents", prefetch=[ # Dense semantic search Prefetch(query=dense_embedding, using="dense", limit=20), # Sparse keyword search Prefetch(query=SparseVector(indices=sparse_indices, values=sparse_values), using="sparse", limit=20) ], # Fuse results with RRF query=FusionQuery(fusion=Fusion.RRF), limit=10)This single-database hybrid search eliminates the need to manage separate Elasticsearch + vector DB stacks.
Performance Benchmarks (Approximate, 2025)
For 1M vectors, 1536 dimensions, cosine similarity, ~99% recall:
| Database | QPS (single node) | Latency p95 | Memory |
|---|---|---|---|
| Qdrant | ~3,000 | ~5ms | ~8GB |
| Weaviate | ~2,500 | ~7ms | ~10GB |
| Pinecone | ~2,000 | ~10ms | Managed |
| pgvector (HNSW) | ~800 | ~15ms | ~8GB |
| Chroma | ~500 | ~25ms | ~8GB |
Benchmarks vary significantly by hardware, ef values, and workload patterns. Always benchmark on your specific use case.