AI  /  Generative AI

Generative AI 26 guides · updated 2026

From transformer foundations to production RAG, tool-using agents, and the Model Context Protocol — the GenAI stack as it's actually being built in 2026.

Vector Databases

You’ve created your embeddings. Now you need to store millions of them and find the most similar ones to a query vector in milliseconds. That’s what vector databases do. They’re a specialized data infrastructure layer that makes semantic search practical at scale.


Why Regular Databases Don’t Work

Standard databases optimize for exact lookups (give me the row with id=42) or range queries (give me all rows where age > 30). They’re terrible at “give me the 10 most similar vectors to this query.”

A brute-force similarity search over 10 million vectors would compute 10 million cosine similarities per query — that’s roughly 1–2 seconds on modern hardware, far too slow for interactive applications. Vector databases use specialized index structures to make this fast.


The key insight: for most applications, finding the exact top-10 most similar vectors is unnecessary. Finding vectors that are approximately most similar (recall@10 > 90%) is fast enough and the quality difference is negligible.

This is Approximate Nearest Neighbor (ANN) search, and it’s what every vector database uses internally.

Exact NN: 100% recall, O(N) per query → 1000ms for 10M vectors
ANN (HNSW): 99% recall, O(log N) per query → 5ms for 10M vectors

HNSW: The Dominant Index Structure

Hierarchical Navigable Small World (HNSW) graphs are the standard index in most production vector databases. The intuition:

Layer 2 (long hops): ●──────────────────●
Layer 1 (medium): ●────●────●────●───●
Layer 0 (fine): ●──●──●──●──●──●──●──● (all vectors)
Search: Start at top layer (few nodes, long range)
Navigate towards query
Drop to lower layer when closer
Repeat until layer 0 → return top-K

HNSW achieves near-logarithmic query time with excellent recall (~99% in practice). It’s used by Weaviate, Qdrant, Milvus, and Chroma by default.

Key tradeoffs in HNSW:


IVF: For Very Large Scale

Inverted File Index (IVF) partitions the vector space into clusters (using k-means). At query time, only the closest clusters are searched.

IVF workflow:
Train: Cluster all vectors into K centroids (K=1000 typical)
Index: Assign each vector to its nearest centroid
Query: Find top nprobe centroids near query (e.g., nprobe=10)
Search only vectors in those K clusters
Result: ~100× faster than brute force for large datasets
with ~95-98% recall at nprobe=10

Used extensively in Faiss (Facebook AI Similarity Search) and as an option in most managed databases. Preferred for datasets >100M vectors.


Major Vector Database Options (2026)

Managed / Cloud-Native

Pinecone

Weaviate Cloud

Qdrant Cloud

Self-Hosted / Open-Source

Qdrant (self-hosted)

Weaviate (self-hosted)

Chroma

Milvus / Zilliz

Postgres Extensions

pgvector

-- pgvector example
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
metadata JSONB,
embedding vector(1536)
);
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
-- Semantic search
SELECT content, 1 - (embedding <=> $1::vector) as similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;

Choosing the Right Vector Database

Start with Postgres + pgvector if:
✓ You're already on Postgres
✓ Under ~5M vectors
✓ Team prefers minimal new infrastructure
Use Qdrant (self-hosted) if:
✓ You need performance + control + open source
✓ Compliance/data locality requirements
✓ Hybrid search (dense + sparse) is needed
Use Pinecone if:
✓ Team wants zero infrastructure management
✓ Variable traffic (serverless pricing)
✓ Getting started fast matters more than cost at scale
Use Milvus/Zilliz if:
✓ 100M+ vectors
✓ Enterprise scale with dedicated ops team

Metadata Filtering

Pure vector search often isn’t enough. You also need to filter by metadata:

# Qdrant: hybrid filter + vector search
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter={
"must": [
{"key": "department", "match": {"value": "engineering"}},
{"key": "created_after", "range": {"gte": "2024-01-01"}}
]
},
limit=10
)

This is called filtered vector search — restrict the ANN search to the subset of vectors matching the filter. Most databases handle this efficiently without a full scan.


Hybrid Search: Combining Dense and Sparse Vectors

Many modern vector databases support storing both dense embeddings and sparse BM25/SPLADE vectors, enabling native hybrid search:

# Qdrant hybrid search
from qdrant_client.models import SparseVector, NamedSparseVector
results = client.query_points(
collection_name="documents",
prefetch=[
# Dense semantic search
Prefetch(query=dense_embedding, using="dense", limit=20),
# Sparse keyword search
Prefetch(query=SparseVector(indices=sparse_indices, values=sparse_values),
using="sparse", limit=20)
],
# Fuse results with RRF
query=FusionQuery(fusion=Fusion.RRF),
limit=10
)

This single-database hybrid search eliminates the need to manage separate Elasticsearch + vector DB stacks.


Performance Benchmarks (Approximate, 2025)

For 1M vectors, 1536 dimensions, cosine similarity, ~99% recall:

DatabaseQPS (single node)Latency p95Memory
Qdrant~3,000~5ms~8GB
Weaviate~2,500~7ms~10GB
Pinecone~2,000~10msManaged
pgvector (HNSW)~800~15ms~8GB
Chroma~500~25ms~8GB

Benchmarks vary significantly by hardware, ef values, and workload patterns. Always benchmark on your specific use case.