Metadata Filtering: The Precision Lever in Vector Search
Pure semantic search has a well-known failure mode: it’s so good at finding similar content that it sometimes returns relevant-looking results from the wrong data source, wrong time period, or wrong access tier. A user asks about Q4 2024 financial results and gets back a perfectly worded passage from Q4 2022.
Metadata filtering is how you constrain vector search to the right slice of your corpus. It’s the difference between “find the most similar vectors” and “find the most similar vectors from these documents, written after this date, in this category, accessible to this user.”
What Metadata Filtering Actually Does
Every vector in your store can carry a payload — a dictionary of scalar values attached to the embedding. These payloads let you filter the search space before or after ANN lookup:
Vector ID: 8472Embedding: [0.12, -0.34, 0.89, ...] (1536 dimensions)Payload: { "source": "annual_report_2024.pdf", "page": 42, "category": "financials", "date": "2024-12-01", "department": "investor_relations", "access_level": "public", "language": "en"}When a query arrives with filter conditions, only vectors whose payload matches the filter participate in the ANN search.
Pre-Filtering vs Post-Filtering
This is the most important architectural decision in metadata filtering, and it has real performance implications.
Post-Filtering
Search first, filter after:
1. ANN search → 1000 nearest vectors2. Apply filter → keep only vectors matching criteria3. Return top K filtered results
Problem: If your filter is selective (only 2% of corpus matches), you might retrieve 1000 vectors but only 5 pass the filter. You asked for k=10 but get 5 results — recall suffers.Post-filtering is simple but degrades when filters are highly selective. The ANN stage is “unaware” of the filter, so it wastes work retrieving vectors that will be discarded.
Pre-Filtering
Filter first, search within filtered subset:
1. Apply filter → identify 50,000 matching vectors (out of 5M total)2. ANN search restricted to those 50,000 vectors3. Return top K from that subset
Result: Full recall within the filtered set.Problem: Filtering 5M vectors before ANN search can be slow if the filter isn't efficiently indexed.Qdrant’s Approach: Indexed Payload Filtering
Qdrant (and recent Weaviate) take a smarter approach. They build inverted indexes on payload fields, enabling efficient pre-filtering that’s as fast as a database index lookup:
from qdrant_client.models import FieldCondition, Filter, MatchValue, Range
# Qdrant pre-filtered searchresults = client.search( collection_name="documents", query_vector=query_embedding, query_filter=Filter( must=[ FieldCondition(key="category", match=MatchValue(value="financials")), FieldCondition(key="date", range=Range(gte="2024-01-01")), FieldCondition(key="access_level", match=MatchValue(value="public")), ] ), limit=10,)Qdrant applies the filter using its payload index, producing the candidate set, then runs HNSW only within that candidate set. This maintains recall without the overhead of filtering a full ANN result set.
Designing Your Metadata Schema
The metadata you attach at ingestion time determines what filters you can apply at query time. Design this schema upfront — adding new fields later often requires re-ingesting your entire corpus.
Essential fields for most RAG systems:
standard_metadata = { # Source tracking "source_id": "doc_abc123", # unique document identifier "source_type": "pdf", # pdf, html, txt, json, etc. "source_url": "s3://bucket/file", # where to fetch original
# Temporal "created_at": "2024-06-15", # document creation date "updated_at": "2025-01-10", # last modification
# Classification "category": "legal", # domain category "subcategory": "contracts", "tags": ["nda", "vendor", "2024"], # multi-value tags
# Access control "tenant_id": "acme_corp", # for multi-tenant systems "access_level": "confidential", # public/internal/confidential
# Chunk position "chunk_index": 3, # position within document "total_chunks": 12,}Complex Filter Patterns
Most vector databases support boolean logic in filters:
# Weaviate: documents from last 90 days in specific categoriesimport weaviate.classes.query as wqfrom datetime import datetime, timedelta
cutoff = (datetime.now() - timedelta(days=90)).isoformat()
results = collection.query.near_vector( near_vector=query_embedding, limit=10, filters=( wq.Filter.by_property("category").contains_any(["legal", "finance"]) & wq.Filter.by_property("created_at").greater_than(cutoff) & wq.Filter.by_property("access_level").equal("internal") ))Dynamic Filter Construction from Queries
A powerful pattern is extracting filter conditions directly from the user’s natural language query, then applying them programmatically:
User query: "Show me contracts from the legal team created in 2024"
Extracted filters: - category: "legal" - source_type: "contract" - created_at: >= "2024-01-01" AND < "2025-01-01"
Applied to vector search alongside query embeddingThis is the foundation of self-query retrieval (covered in a separate section), where an LLM parses the query and constructs the filter automatically.
Filter Selectivity and Performance
High selectivity filters (few matching documents) can cause problems:
100% selectivity: no filter → 5M candidates for ANN50% selectivity: "language = en" → 2.5M candidates5% selectivity: "tenant = acme" → 250K candidates0.1% selectivity: "document = specific_id" → 500 candidates
At very high selectivity, ANN search is overkill — a simplelinear scan over the filtered set may be faster.Qdrant automatically switches between ANN and linear scan based on selectivity estimates. Other databases require manual threshold configuration.
2025 Trend: Semantic Metadata
Beyond scalar metadata, some teams are now storing embedding-based metadata — embeddings of the document title, summary, or category description. At query time, this allows filtering by semantic similarity on the metadata itself:
Instead of: filter where category == "machine_learning"New approach: filter where embed(category) is similar to embed(query_topic)This enables fuzzy category matching without maintaining an exhaustive taxonomy. Weaviate’s multi-vector storage supports this natively.
Implementation Checklist
- Define your metadata schema before first ingestion
- Index all fields you intend to filter on (check DB-specific indexing requirements)
- Test filter selectivity on representative query distributions
- Add tenant isolation metadata for multi-tenant deployments
- Implement access-level filtering as a mandatory non-overridable filter
- Monitor filter cardinality in production (very high cardinality = slower filtering)
- Plan for schema evolution — how will you add new metadata fields later?
Metadata filtering is what separates a RAG demo from a production system. Getting the schema right early saves significant re-ingestion work later.