Multi-Tenant Vector Stores: One Database, Many Customers

When you’re building a B2B AI product — a RAG-powered assistant that serves multiple companies, teams, or users — you immediately hit the question of tenant isolation. Do you give each customer their own dedicated vector database? Or do you share infrastructure?

Dedicated infrastructure is the simplest isolation model but doesn’t scale economically. A shared vector store is more cost-efficient but requires careful design to prevent data leakage between tenants. This guide covers the patterns, trade-offs, and implementation details for multi-tenant vector stores.

The Three Isolation Models

1. Collection per Tenant (Full Isolation)

Each tenant gets their own collection (table/namespace) in the vector database. There is zero logical sharing at the storage layer.

Vector Database:
├── Collection: tenant_acme
├── Collection: tenant_globex
├── Collection: tenant_initech
└── Collection: tenant_umbrella

Pros:

Absolute data isolation — impossible to leak across tenants
Independent index tuning per tenant
Simple to delete a tenant (drop collection)
Clear capacity monitoring per tenant

Cons:

Large number of collections at scale (10K customers = 10K collections)
Most vector databases impose collection count limits
Cold start latency when a new tenant’s collection isn’t warmed up
Operational overhead scales with tenant count

When to use: High-compliance environments (HIPAA, SOC 2), enterprise customers with strict data residency requirements, large tenants with significant data volume.

2. Namespace / Partition Isolation

One collection, logically partitioned by tenant using native namespace support:

Single Collection: "documents"
├── Namespace: "tenant_acme"    → vectors + metadata for Acme
├── Namespace: "tenant_globex"  → vectors + metadata for Globex
└── Namespace: "tenant_initech" → vectors + metadata for Initech

Pinecone namespaces are the canonical example — each namespace is an isolated partition within an index. Queries are scoped to a namespace at request time.

Pros:

Scales to millions of tenants without collection count limits
Single index to manage and monitor
Shared infrastructure costs
Native support in Pinecone, Qdrant, Weaviate

Cons:

Data shares the same physical index (namespace isolation is logical, not physical)
Cannot independently tune index parameters per tenant
More complex access control logic in application code

3. Metadata Filter Isolation

All tenants share a single collection. Tenant isolation is enforced purely through metadata filtering — every query includes a mandatory tenant_id filter:

def secure_search(query_embedding, tenant_id, k=10):
    return client.search(
        collection_name="all_documents",
        query_vector=query_embedding,
        query_filter=Filter(
            must=[
                FieldCondition(
                    key="tenant_id",
                    match=MatchValue(value=tenant_id)
                )
            ]
        ),
        limit=k,
    )

Pros:

Simplest implementation
Maximum resource sharing
Zero tenant provisioning overhead

Cons:

Security entirely depends on application-layer filter enforcement — a bug exposes all tenants’ data
A misconfigured query can leak data (no database-layer hard isolation)
Not suitable for high-compliance requirements
Performance degrades as tenant count grows and filter selectivity increases

When to use: Internal tools, low-sensitivity data, small teams. Never for customer-facing products with PII or proprietary information.

Recommended Architecture: Hybrid Approach

Most production multi-tenant RAG systems use a tiered model based on tenant size and compliance requirements:

Tier 1 (Enterprise): Collection per tenant
  - Large data volumes (>1M chunks)
  - Strict compliance requirements
  - Custom SLAs

Tier 2 (Standard): Namespace per tenant
  - Medium data volumes (10K–1M chunks)
  - Standard business data
  - Shared infrastructure

Tier 3 (Starter): Metadata filter isolation
  - Small data volumes (<10K chunks)
  - Non-sensitive data
  - Cost-sensitive early-stage tenants

The application routing layer decides which tier a tenant belongs to and routes queries accordingly.

Implementation: Namespace-Based Isolation with Pinecone

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")
index = pc.Index("documents")

class TenantVectorStore:
    def __init__(self, index):
        self.index = index

    def upsert(self, tenant_id: str, vectors: list):
        """Upsert vectors into tenant namespace."""
        self.index.upsert(
            vectors=vectors,
            namespace=tenant_id
        )

    def query(self, tenant_id: str, query_vector: list, top_k: int = 10):
        """Query is automatically scoped to tenant namespace."""
        return self.index.query(
            vector=query_vector,
            top_k=top_k,
            namespace=tenant_id,
            include_metadata=True
        )

    def delete_tenant(self, tenant_id: str):
        """Delete all vectors for a tenant."""
        self.index.delete(
            delete_all=True,
            namespace=tenant_id
        )

Access Control at the Application Layer

Regardless of the isolation model, access control must be enforced at the application layer — never trust the client to provide their own tenant_id:

from functools import wraps

def require_tenant_scope(f):
    """Decorator that ensures tenant_id comes from the auth token, not request body."""
    @wraps(f)
    async def wrapper(request, *args, **kwargs):
        # Extract tenant from JWT token — never from request body
        tenant_id = request.auth.claims["tenant_id"]
        return await f(request, tenant_id=tenant_id, *args, **kwargs)
    return wrapper

@require_tenant_scope
async def search_documents(request, tenant_id: str):
    results = vector_store.query(
        tenant_id=tenant_id,  # always from auth, not user input
        query_vector=embed(request.body["query"]),
    )
    return results

Cross-Tenant Search for Internal Use Cases

Some systems need to search across all tenants (admin tools, analytics, compliance monitoring). Build this as a separate access path with explicit authorization requirements:

def admin_cross_tenant_search(
    query_embedding,
    tenant_ids: list[str] = None,  # None = all tenants
    requester_role: str = "user"
):
    if requester_role != "admin":
        raise PermissionError("Cross-tenant search requires admin role")

    if tenant_ids is None:
        # Search all namespaces — use with care
        return index.query(vector=query_embedding, top_k=50)
    else:
        # Search specific tenants
        results = []
        for tenant_id in tenant_ids:
            results.extend(
                index.query(vector=query_embedding, top_k=10, namespace=tenant_id).matches
            )
        return sorted(results, key=lambda x: x.score, reverse=True)[:20]

2025 Trend: Tenant-Aware Index Optimization

Newer vector database platforms are developing per-namespace capacity management — letting you allocate different index parameters (ef_construction, nlist) to tenants based on their data characteristics and SLA requirements. This allows enterprise tenants to have premium index configurations while starter tenants share lower-cost defaults, all within the same physical infrastructure.

Checklist for Multi-Tenant Deployments

Choose isolation model based on compliance requirements, not just convenience
Enforce tenant_id from authentication context only — never from user input
Implement tenant provisioning and deprovisioning automation
Add per-tenant usage monitoring (vector count, query volume)
Test cross-tenant isolation with negative test cases (verify tenant A cannot see tenant B data)
Plan capacity limits per tenant (max vectors, max queries per second)
Document isolation model in your security architecture for compliance audits

Multi-tenancy in vector databases is primarily an application architecture problem. The database features (namespaces, collections) give you the tools; the application layer must use them correctly and consistently.