Multi-Tenant Vector Stores: One Database, Many Customers
When you’re building a B2B AI product — a RAG-powered assistant that serves multiple companies, teams, or users — you immediately hit the question of tenant isolation. Do you give each customer their own dedicated vector database? Or do you share infrastructure?
Dedicated infrastructure is the simplest isolation model but doesn’t scale economically. A shared vector store is more cost-efficient but requires careful design to prevent data leakage between tenants. This guide covers the patterns, trade-offs, and implementation details for multi-tenant vector stores.
The Three Isolation Models
1. Collection per Tenant (Full Isolation)
Each tenant gets their own collection (table/namespace) in the vector database. There is zero logical sharing at the storage layer.
Vector Database:├── Collection: tenant_acme├── Collection: tenant_globex├── Collection: tenant_initech└── Collection: tenant_umbrellaPros:
- Absolute data isolation — impossible to leak across tenants
- Independent index tuning per tenant
- Simple to delete a tenant (drop collection)
- Clear capacity monitoring per tenant
Cons:
- Large number of collections at scale (10K customers = 10K collections)
- Most vector databases impose collection count limits
- Cold start latency when a new tenant’s collection isn’t warmed up
- Operational overhead scales with tenant count
When to use: High-compliance environments (HIPAA, SOC 2), enterprise customers with strict data residency requirements, large tenants with significant data volume.
2. Namespace / Partition Isolation
One collection, logically partitioned by tenant using native namespace support:
Single Collection: "documents"├── Namespace: "tenant_acme" → vectors + metadata for Acme├── Namespace: "tenant_globex" → vectors + metadata for Globex└── Namespace: "tenant_initech" → vectors + metadata for InitechPinecone namespaces are the canonical example — each namespace is an isolated partition within an index. Queries are scoped to a namespace at request time.
Pros:
- Scales to millions of tenants without collection count limits
- Single index to manage and monitor
- Shared infrastructure costs
- Native support in Pinecone, Qdrant, Weaviate
Cons:
- Data shares the same physical index (namespace isolation is logical, not physical)
- Cannot independently tune index parameters per tenant
- More complex access control logic in application code
3. Metadata Filter Isolation
All tenants share a single collection. Tenant isolation is enforced purely through metadata filtering — every query includes a mandatory tenant_id filter:
def secure_search(query_embedding, tenant_id, k=10): return client.search( collection_name="all_documents", query_vector=query_embedding, query_filter=Filter( must=[ FieldCondition( key="tenant_id", match=MatchValue(value=tenant_id) ) ] ), limit=k, )Pros:
- Simplest implementation
- Maximum resource sharing
- Zero tenant provisioning overhead
Cons:
- Security entirely depends on application-layer filter enforcement — a bug exposes all tenants’ data
- A misconfigured query can leak data (no database-layer hard isolation)
- Not suitable for high-compliance requirements
- Performance degrades as tenant count grows and filter selectivity increases
When to use: Internal tools, low-sensitivity data, small teams. Never for customer-facing products with PII or proprietary information.
Recommended Architecture: Hybrid Approach
Most production multi-tenant RAG systems use a tiered model based on tenant size and compliance requirements:
Tier 1 (Enterprise): Collection per tenant - Large data volumes (>1M chunks) - Strict compliance requirements - Custom SLAs
Tier 2 (Standard): Namespace per tenant - Medium data volumes (10K–1M chunks) - Standard business data - Shared infrastructure
Tier 3 (Starter): Metadata filter isolation - Small data volumes (<10K chunks) - Non-sensitive data - Cost-sensitive early-stage tenantsThe application routing layer decides which tier a tenant belongs to and routes queries accordingly.
Implementation: Namespace-Based Isolation with Pinecone
from pinecone import Pinecone
pc = Pinecone(api_key="your-api-key")index = pc.Index("documents")
class TenantVectorStore: def __init__(self, index): self.index = index
def upsert(self, tenant_id: str, vectors: list): """Upsert vectors into tenant namespace.""" self.index.upsert( vectors=vectors, namespace=tenant_id )
def query(self, tenant_id: str, query_vector: list, top_k: int = 10): """Query is automatically scoped to tenant namespace.""" return self.index.query( vector=query_vector, top_k=top_k, namespace=tenant_id, include_metadata=True )
def delete_tenant(self, tenant_id: str): """Delete all vectors for a tenant.""" self.index.delete( delete_all=True, namespace=tenant_id )Access Control at the Application Layer
Regardless of the isolation model, access control must be enforced at the application layer — never trust the client to provide their own tenant_id:
from functools import wraps
def require_tenant_scope(f): """Decorator that ensures tenant_id comes from the auth token, not request body.""" @wraps(f) async def wrapper(request, *args, **kwargs): # Extract tenant from JWT token — never from request body tenant_id = request.auth.claims["tenant_id"] return await f(request, tenant_id=tenant_id, *args, **kwargs) return wrapper
@require_tenant_scopeasync def search_documents(request, tenant_id: str): results = vector_store.query( tenant_id=tenant_id, # always from auth, not user input query_vector=embed(request.body["query"]), ) return resultsCross-Tenant Search for Internal Use Cases
Some systems need to search across all tenants (admin tools, analytics, compliance monitoring). Build this as a separate access path with explicit authorization requirements:
def admin_cross_tenant_search( query_embedding, tenant_ids: list[str] = None, # None = all tenants requester_role: str = "user"): if requester_role != "admin": raise PermissionError("Cross-tenant search requires admin role")
if tenant_ids is None: # Search all namespaces — use with care return index.query(vector=query_embedding, top_k=50) else: # Search specific tenants results = [] for tenant_id in tenant_ids: results.extend( index.query(vector=query_embedding, top_k=10, namespace=tenant_id).matches ) return sorted(results, key=lambda x: x.score, reverse=True)[:20]2025 Trend: Tenant-Aware Index Optimization
Newer vector database platforms are developing per-namespace capacity management — letting you allocate different index parameters (ef_construction, nlist) to tenants based on their data characteristics and SLA requirements. This allows enterprise tenants to have premium index configurations while starter tenants share lower-cost defaults, all within the same physical infrastructure.
Checklist for Multi-Tenant Deployments
- Choose isolation model based on compliance requirements, not just convenience
- Enforce tenant_id from authentication context only — never from user input
- Implement tenant provisioning and deprovisioning automation
- Add per-tenant usage monitoring (vector count, query volume)
- Test cross-tenant isolation with negative test cases (verify tenant A cannot see tenant B data)
- Plan capacity limits per tenant (max vectors, max queries per second)
- Document isolation model in your security architecture for compliance audits
Multi-tenancy in vector databases is primarily an application architecture problem. The database features (namespaces, collections) give you the tools; the application layer must use them correctly and consistently.