Metadata Filtering in Vector Search: Precise RAG Retrieval

Master metadata filtering in vector databases — pre-filtering, post-filtering, hybrid approaches, and filter design patterns for accurate RAG retrieval.

Metadata Filtering: The Precision Lever in Vector Search

Pure semantic search has a well-known failure mode: it’s so good at finding similar content that it sometimes returns relevant-looking results from the wrong data source, wrong time period, or wrong access tier. A user asks about Q4 2024 financial results and gets back a perfectly worded passage from Q4 2022.

Metadata filtering is how you constrain vector search to the right slice of your corpus. It’s the difference between “find the most similar vectors” and “find the most similar vectors from these documents, written after this date, in this category, accessible to this user.”

What Metadata Filtering Actually Does

Every vector in your store can carry a payload — a dictionary of scalar values attached to the embedding. These payloads let you filter the search space before or after ANN lookup:

Vector ID: 8472
Embedding: [0.12, -0.34, 0.89, ...] (1536 dimensions)
Payload: {
"source": "annual_report_2024.pdf",
"page": 42,
"category": "financials",
"date": "2024-12-01",
"department": "investor_relations",
"access_level": "public",
"language": "en"
}

When a query arrives with filter conditions, only vectors whose payload matches the filter participate in the ANN search.

Pre-Filtering vs Post-Filtering

This is the most important architectural decision in metadata filtering, and it has real performance implications.

Post-Filtering

Search first, filter after:

1. ANN search → 1000 nearest vectors
2. Apply filter → keep only vectors matching criteria
3. Return top K filtered results
Problem: If your filter is selective (only 2% of corpus matches),
you might retrieve 1000 vectors but only 5 pass the filter.
You asked for k=10 but get 5 results — recall suffers.

Post-filtering is simple but degrades when filters are highly selective. The ANN stage is “unaware” of the filter, so it wastes work retrieving vectors that will be discarded.

Pre-Filtering

Filter first, search within filtered subset:

1. Apply filter → identify 50,000 matching vectors (out of 5M total)
2. ANN search restricted to those 50,000 vectors
3. Return top K from that subset
Result: Full recall within the filtered set.
Problem: Filtering 5M vectors before ANN search can be slow
if the filter isn't efficiently indexed.

Qdrant’s Approach: Indexed Payload Filtering

Qdrant (and recent Weaviate) take a smarter approach. They build inverted indexes on payload fields, enabling efficient pre-filtering that’s as fast as a database index lookup:

from qdrant_client.models import FieldCondition, Filter, MatchValue, Range
# Qdrant pre-filtered search
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter=Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="financials")),
FieldCondition(key="date", range=Range(gte="2024-01-01")),
FieldCondition(key="access_level", match=MatchValue(value="public")),
]
),
limit=10,
)

Qdrant applies the filter using its payload index, producing the candidate set, then runs HNSW only within that candidate set. This maintains recall without the overhead of filtering a full ANN result set.

Designing Your Metadata Schema

The metadata you attach at ingestion time determines what filters you can apply at query time. Design this schema upfront — adding new fields later often requires re-ingesting your entire corpus.

Essential fields for most RAG systems:

standard_metadata = {
# Source tracking
"source_id": "doc_abc123", # unique document identifier
"source_type": "pdf", # pdf, html, txt, json, etc.
"source_url": "s3://bucket/file", # where to fetch original
# Temporal
"created_at": "2024-06-15", # document creation date
"updated_at": "2025-01-10", # last modification
# Classification
"category": "legal", # domain category
"subcategory": "contracts",
"tags": ["nda", "vendor", "2024"], # multi-value tags
# Access control
"tenant_id": "acme_corp", # for multi-tenant systems
"access_level": "confidential", # public/internal/confidential
# Chunk position
"chunk_index": 3, # position within document
"total_chunks": 12,
}

Complex Filter Patterns

Most vector databases support boolean logic in filters:

# Weaviate: documents from last 90 days in specific categories
import weaviate.classes.query as wq
from datetime import datetime, timedelta
cutoff = (datetime.now() - timedelta(days=90)).isoformat()
results = collection.query.near_vector(
near_vector=query_embedding,
limit=10,
filters=(
wq.Filter.by_property("category").contains_any(["legal", "finance"])
& wq.Filter.by_property("created_at").greater_than(cutoff)
& wq.Filter.by_property("access_level").equal("internal")
)
)

Dynamic Filter Construction from Queries

A powerful pattern is extracting filter conditions directly from the user’s natural language query, then applying them programmatically:

User query: "Show me contracts from the legal team created in 2024"
Extracted filters:
- category: "legal"
- source_type: "contract"
- created_at: >= "2024-01-01" AND < "2025-01-01"
Applied to vector search alongside query embedding

This is the foundation of self-query retrieval (covered in a separate section), where an LLM parses the query and constructs the filter automatically.

Filter Selectivity and Performance

High selectivity filters (few matching documents) can cause problems:

100% selectivity: no filter → 5M candidates for ANN
50% selectivity: "language = en" → 2.5M candidates
5% selectivity: "tenant = acme" → 250K candidates
0.1% selectivity: "document = specific_id" → 500 candidates
At very high selectivity, ANN search is overkill — a simple
linear scan over the filtered set may be faster.

Qdrant automatically switches between ANN and linear scan based on selectivity estimates. Other databases require manual threshold configuration.

2025 Trend: Semantic Metadata

Beyond scalar metadata, some teams are now storing embedding-based metadata — embeddings of the document title, summary, or category description. At query time, this allows filtering by semantic similarity on the metadata itself:

Instead of: filter where category == "machine_learning"
New approach: filter where embed(category) is similar to embed(query_topic)

This enables fuzzy category matching without maintaining an exhaustive taxonomy. Weaviate’s multi-vector storage supports this natively.

Implementation Checklist

  • Define your metadata schema before first ingestion
  • Index all fields you intend to filter on (check DB-specific indexing requirements)
  • Test filter selectivity on representative query distributions
  • Add tenant isolation metadata for multi-tenant deployments
  • Implement access-level filtering as a mandatory non-overridable filter
  • Monitor filter cardinality in production (very high cardinality = slower filtering)
  • Plan for schema evolution — how will you add new metadata fields later?

Metadata filtering is what separates a RAG demo from a production system. Getting the schema right early saves significant re-ingestion work later.