Multi-Tenant Vector Stores: Isolating Data in Shared RAG Infrastructure

Design multi-tenant vector stores for RAG — namespace isolation, collection-per-tenant, metadata filtering, and security patterns for shared AI infrastructure.

Multi-Tenant Vector Stores: One Database, Many Customers

When you’re building a B2B AI product — a RAG-powered assistant that serves multiple companies, teams, or users — you immediately hit the question of tenant isolation. Do you give each customer their own dedicated vector database? Or do you share infrastructure?

Dedicated infrastructure is the simplest isolation model but doesn’t scale economically. A shared vector store is more cost-efficient but requires careful design to prevent data leakage between tenants. This guide covers the patterns, trade-offs, and implementation details for multi-tenant vector stores.

The Three Isolation Models

1. Collection per Tenant (Full Isolation)

Each tenant gets their own collection (table/namespace) in the vector database. There is zero logical sharing at the storage layer.

Vector Database:
├── Collection: tenant_acme
├── Collection: tenant_globex
├── Collection: tenant_initech
└── Collection: tenant_umbrella

Pros:

  • Absolute data isolation — impossible to leak across tenants
  • Independent index tuning per tenant
  • Simple to delete a tenant (drop collection)
  • Clear capacity monitoring per tenant

Cons:

  • Large number of collections at scale (10K customers = 10K collections)
  • Most vector databases impose collection count limits
  • Cold start latency when a new tenant’s collection isn’t warmed up
  • Operational overhead scales with tenant count

When to use: High-compliance environments (HIPAA, SOC 2), enterprise customers with strict data residency requirements, large tenants with significant data volume.

2. Namespace / Partition Isolation

One collection, logically partitioned by tenant using native namespace support:

Single Collection: "documents"
├── Namespace: "tenant_acme" → vectors + metadata for Acme
├── Namespace: "tenant_globex" → vectors + metadata for Globex
└── Namespace: "tenant_initech" → vectors + metadata for Initech

Pinecone namespaces are the canonical example — each namespace is an isolated partition within an index. Queries are scoped to a namespace at request time.

Pros:

  • Scales to millions of tenants without collection count limits
  • Single index to manage and monitor
  • Shared infrastructure costs
  • Native support in Pinecone, Qdrant, Weaviate

Cons:

  • Data shares the same physical index (namespace isolation is logical, not physical)
  • Cannot independently tune index parameters per tenant
  • More complex access control logic in application code

3. Metadata Filter Isolation

All tenants share a single collection. Tenant isolation is enforced purely through metadata filtering — every query includes a mandatory tenant_id filter:

def secure_search(query_embedding, tenant_id, k=10):
return client.search(
collection_name="all_documents",
query_vector=query_embedding,
query_filter=Filter(
must=[
FieldCondition(
key="tenant_id",
match=MatchValue(value=tenant_id)
)
]
),
limit=k,
)

Pros:

  • Simplest implementation
  • Maximum resource sharing
  • Zero tenant provisioning overhead

Cons:

  • Security entirely depends on application-layer filter enforcement — a bug exposes all tenants’ data
  • A misconfigured query can leak data (no database-layer hard isolation)
  • Not suitable for high-compliance requirements
  • Performance degrades as tenant count grows and filter selectivity increases

When to use: Internal tools, low-sensitivity data, small teams. Never for customer-facing products with PII or proprietary information.

Most production multi-tenant RAG systems use a tiered model based on tenant size and compliance requirements:

Tier 1 (Enterprise): Collection per tenant
- Large data volumes (>1M chunks)
- Strict compliance requirements
- Custom SLAs
Tier 2 (Standard): Namespace per tenant
- Medium data volumes (10K–1M chunks)
- Standard business data
- Shared infrastructure
Tier 3 (Starter): Metadata filter isolation
- Small data volumes (<10K chunks)
- Non-sensitive data
- Cost-sensitive early-stage tenants

The application routing layer decides which tier a tenant belongs to and routes queries accordingly.

Implementation: Namespace-Based Isolation with Pinecone

from pinecone import Pinecone
pc = Pinecone(api_key="your-api-key")
index = pc.Index("documents")
class TenantVectorStore:
def __init__(self, index):
self.index = index
def upsert(self, tenant_id: str, vectors: list):
"""Upsert vectors into tenant namespace."""
self.index.upsert(
vectors=vectors,
namespace=tenant_id
)
def query(self, tenant_id: str, query_vector: list, top_k: int = 10):
"""Query is automatically scoped to tenant namespace."""
return self.index.query(
vector=query_vector,
top_k=top_k,
namespace=tenant_id,
include_metadata=True
)
def delete_tenant(self, tenant_id: str):
"""Delete all vectors for a tenant."""
self.index.delete(
delete_all=True,
namespace=tenant_id
)

Access Control at the Application Layer

Regardless of the isolation model, access control must be enforced at the application layer — never trust the client to provide their own tenant_id:

from functools import wraps
def require_tenant_scope(f):
"""Decorator that ensures tenant_id comes from the auth token, not request body."""
@wraps(f)
async def wrapper(request, *args, **kwargs):
# Extract tenant from JWT token — never from request body
tenant_id = request.auth.claims["tenant_id"]
return await f(request, tenant_id=tenant_id, *args, **kwargs)
return wrapper
@require_tenant_scope
async def search_documents(request, tenant_id: str):
results = vector_store.query(
tenant_id=tenant_id, # always from auth, not user input
query_vector=embed(request.body["query"]),
)
return results

Cross-Tenant Search for Internal Use Cases

Some systems need to search across all tenants (admin tools, analytics, compliance monitoring). Build this as a separate access path with explicit authorization requirements:

def admin_cross_tenant_search(
query_embedding,
tenant_ids: list[str] = None, # None = all tenants
requester_role: str = "user"
):
if requester_role != "admin":
raise PermissionError("Cross-tenant search requires admin role")
if tenant_ids is None:
# Search all namespaces — use with care
return index.query(vector=query_embedding, top_k=50)
else:
# Search specific tenants
results = []
for tenant_id in tenant_ids:
results.extend(
index.query(vector=query_embedding, top_k=10, namespace=tenant_id).matches
)
return sorted(results, key=lambda x: x.score, reverse=True)[:20]

2025 Trend: Tenant-Aware Index Optimization

Newer vector database platforms are developing per-namespace capacity management — letting you allocate different index parameters (ef_construction, nlist) to tenants based on their data characteristics and SLA requirements. This allows enterprise tenants to have premium index configurations while starter tenants share lower-cost defaults, all within the same physical infrastructure.

Checklist for Multi-Tenant Deployments

  • Choose isolation model based on compliance requirements, not just convenience
  • Enforce tenant_id from authentication context only — never from user input
  • Implement tenant provisioning and deprovisioning automation
  • Add per-tenant usage monitoring (vector count, query volume)
  • Test cross-tenant isolation with negative test cases (verify tenant A cannot see tenant B data)
  • Plan capacity limits per tenant (max vectors, max queries per second)
  • Document isolation model in your security architecture for compliance audits

Multi-tenancy in vector databases is primarily an application architecture problem. The database features (namespaces, collections) give you the tools; the application layer must use them correctly and consistently.