Vector Databases

You’ve created your embeddings. Now you need to store millions of them and find the most similar ones to a query vector in milliseconds. That’s what vector databases do. They’re a specialized data infrastructure layer that makes semantic search practical at scale.

Why Regular Databases Don’t Work

Standard databases optimize for exact lookups (give me the row with id=42) or range queries (give me all rows where age > 30). They’re terrible at “give me the 10 most similar vectors to this query.”

A brute-force similarity search over 10 million vectors would compute 10 million cosine similarities per query — that’s roughly 1–2 seconds on modern hardware, far too slow for interactive applications. Vector databases use specialized index structures to make this fast.

Approximate Nearest Neighbor Search

The key insight: for most applications, finding the exact top-10 most similar vectors is unnecessary. Finding vectors that are approximately most similar (recall@10 > 90%) is fast enough and the quality difference is negligible.

This is Approximate Nearest Neighbor (ANN) search, and it’s what every vector database uses internally.

Exact NN:     100% recall, O(N) per query → 1000ms for 10M vectors
ANN (HNSW):   99% recall, O(log N) per query → 5ms for 10M vectors

HNSW: The Dominant Index Structure

Hierarchical Navigable Small World (HNSW) graphs are the standard index in most production vector databases. The intuition:

Layer 2 (long hops): ●──────────────────●
                      │
Layer 1 (medium):    ●────●────●────●───●
                      │
Layer 0 (fine):      ●──●──●──●──●──●──●──● (all vectors)

Search: Start at top layer (few nodes, long range)
        Navigate towards query
        Drop to lower layer when closer
        Repeat until layer 0 → return top-K

HNSW achieves near-logarithmic query time with excellent recall (~99% in practice). It’s used by Weaviate, Qdrant, Milvus, and Chroma by default.

Key tradeoffs in HNSW:

ef_construction (higher → better recall, slower indexing, more memory)
m (connections per node, higher → better recall, more memory)
ef at query time (higher → better recall, slower queries)

IVF: For Very Large Scale

Inverted File Index (IVF) partitions the vector space into clusters (using k-means). At query time, only the closest clusters are searched.

IVF workflow:
  Train: Cluster all vectors into K centroids (K=1000 typical)
  Index: Assign each vector to its nearest centroid

  Query: Find top nprobe centroids near query (e.g., nprobe=10)
         Search only vectors in those K clusters

Result: ~100× faster than brute force for large datasets
        with ~95-98% recall at nprobe=10

Used extensively in Faiss (Facebook AI Similarity Search) and as an option in most managed databases. Preferred for datasets >100M vectors.

Major Vector Database Options (2026)

Managed / Cloud-Native

Pinecone

Fully managed, no infrastructure to run
Excellent performance, simple API
Serverless tier (pay per query) good for variable workloads
No local deployment option (data leaves your infrastructure)

Weaviate Cloud

Open-source core, managed cloud option
Strong metadata filtering + vector search combination
Built-in text and multimodal vectorization
GraphQL query interface

Qdrant Cloud

Fast, Rust-based, excellent performance benchmarks
Open-source + managed cloud
Strong filtering capabilities, sparse vector support (for hybrid)
Native support for named vectors (multiple embedding spaces)

Self-Hosted / Open-Source

Qdrant (self-hosted)

Docker deployment, excellent docs
Best choice for teams wanting full control + high performance

Weaviate (self-hosted)

Richer feature set (modules, GraphQL)
More complex to operate

Chroma

Developer-friendly, embeds in Python processes
Great for prototyping and small-scale production
Not designed for multi-billion vector scale

Milvus / Zilliz

Extremely scalable (designed for 1B+ vectors)
More complex to operate; Zilliz offers managed version

Postgres Extensions

pgvector

Vector search directly in PostgreSQL
Dramatically simpler stack if you’re already on Postgres
Performance lags purpose-built vector DBs at large scale
pgvector 0.7+ added HNSW support, major improvement

-- pgvector example
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    metadata JSONB,
    embedding vector(1536)
);

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Semantic search
SELECT content, 1 - (embedding <=> $1::vector) as similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;

Choosing the Right Vector Database

Start with Postgres + pgvector if:
  ✓ You're already on Postgres
  ✓ Under ~5M vectors
  ✓ Team prefers minimal new infrastructure

Use Qdrant (self-hosted) if:
  ✓ You need performance + control + open source
  ✓ Compliance/data locality requirements
  ✓ Hybrid search (dense + sparse) is needed

Use Pinecone if:
  ✓ Team wants zero infrastructure management
  ✓ Variable traffic (serverless pricing)
  ✓ Getting started fast matters more than cost at scale

Use Milvus/Zilliz if:
  ✓ 100M+ vectors
  ✓ Enterprise scale with dedicated ops team

Metadata Filtering

Pure vector search often isn’t enough. You also need to filter by metadata:

# Qdrant: hybrid filter + vector search
results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    query_filter={
        "must": [
            {"key": "department", "match": {"value": "engineering"}},
            {"key": "created_after", "range": {"gte": "2024-01-01"}}
        ]
    },
    limit=10
)

This is called filtered vector search — restrict the ANN search to the subset of vectors matching the filter. Most databases handle this efficiently without a full scan.

Hybrid Search: Combining Dense and Sparse Vectors

Many modern vector databases support storing both dense embeddings and sparse BM25/SPLADE vectors, enabling native hybrid search:

# Qdrant hybrid search
from qdrant_client.models import SparseVector, NamedSparseVector

results = client.query_points(
    collection_name="documents",
    prefetch=[
        # Dense semantic search
        Prefetch(query=dense_embedding, using="dense", limit=20),
        # Sparse keyword search
        Prefetch(query=SparseVector(indices=sparse_indices, values=sparse_values),
                using="sparse", limit=20)
    ],
    # Fuse results with RRF
    query=FusionQuery(fusion=Fusion.RRF),
    limit=10
)

This single-database hybrid search eliminates the need to manage separate Elasticsearch + vector DB stacks.

Performance Benchmarks (Approximate, 2025)

For 1M vectors, 1536 dimensions, cosine similarity, ~99% recall:

Database	QPS (single node)	Latency p95	Memory
Qdrant	~3,000	~5ms	~8GB
Weaviate	~2,500	~7ms	~10GB
Pinecone	~2,000	~10ms	Managed
pgvector (HNSW)	~800	~15ms	~8GB
Chroma	~500	~25ms	~8GB

Benchmarks vary significantly by hardware, ef values, and workload patterns. Always benchmark on your specific use case.