Metadata-Aware Retrieval: When Knowing Who Said It and When Matters
Pure semantic similarity has a flat worldview — a document from three years ago is as relevant as one from last week if the text vectors are close. A document from an unofficial source ranks the same as one from the authoritative team.
In most real-world knowledge bases, that’s wrong. Recency matters. Source credibility matters. Document type matters. Metadata-aware retrieval incorporates these signals directly into how documents are ranked and selected.
The Limitation of Pure Similarity Ranking
Query: "What is the current API rate limit for our service?"
Pure semantic results: Rank 1: "API rate limits are 100 req/min" — from 2021 documentation (score: 0.92) Rank 2: "Rate limiting allows 500 req/min for enterprise" — from 2024 doc (score: 0.89) Rank 3: "API calls are throttled at 100 per minute" — from archived blog 2022 (score: 0.88)
The correct answer is the 2024 document. Pure semantic search ranked it secondbecause the 2021 doc's phrasing was slightly closer to the query.Metadata-aware retrieval adjusts this ranking by incorporating temporal recency, source type, and other structural signals.
Recency Weighting
Time-aware retrieval boosts recent documents and penalizes stale ones:
import mathfrom datetime import datetime, timezone
def recency_weighted_score( semantic_score: float, doc_date: datetime, recency_weight: float = 0.3, # how much recency influences final score half_life_days: int = 180, # how quickly documents decay (6 months)) -> float: days_old = (datetime.now(timezone.utc) - doc_date).days # Exponential decay: score = e^(-λt), λ = ln(2)/half_life decay_factor = math.exp(-math.log(2) * days_old / half_life_days) # Combine semantic similarity with recency signal return (1 - recency_weight) * semantic_score + recency_weight * decay_factor
# Example:# 2021 doc (3 years old): semantic=0.92# → final = 0.7 * 0.92 + 0.3 * exp(-3*365/180*ln2) = 0.644 + 0.003 = 0.647# 2024 doc (0 days old): semantic=0.89# → final = 0.7 * 0.89 + 0.3 * 1.0 = 0.623 + 0.300 = 0.923 ← winsThe half-life parameter is domain-specific:
- API documentation: 90–180 days (changes frequently)
- Legal regulations: 365–730 days (changes infrequently)
- Historical records: no decay (older is not worse)
- News/current events: 7–30 days (very short half-life)
Source Credibility Scoring
Not all sources are equal. Official documentation should rank higher than community posts for authoritative answers:
SOURCE_CREDIBILITY = { "official_docs": 1.0, "product_changelog": 0.95, "engineering_blog": 0.80, "internal_wiki": 0.75, "community_forum": 0.55, "archived_content": 0.40,}
def credibility_weighted_score( semantic_score: float, source_type: str, credibility_weight: float = 0.2,) -> float: credibility = SOURCE_CREDIBILITY.get(source_type, 0.5) return (1 - credibility_weight) * semantic_score + credibility_weight * credibilityCombining Multiple Metadata Signals
In practice, you combine multiple metadata signals into a single reranking score:
from dataclasses import dataclass
@dataclassclass MetadataSignals: semantic_score: float created_at: datetime source_type: str doc_version: str # "current", "deprecated", "archived" relevance_votes: int # user feedback signal language: str target_audience: str # "beginner", "advanced", "internal"
def metadata_aware_score(signals: MetadataSignals, user_context: dict) -> float: score = signals.semantic_score
# Recency signal days_old = (datetime.now(timezone.utc) - signals.created_at).days recency = math.exp(-math.log(2) * days_old / 180)
# Deprecation penalty version_penalty = { "current": 1.0, "deprecated": 0.5, "archived": 0.2, }.get(signals.doc_version, 0.8)
# Audience relevance audience_score = 1.0 if signals.target_audience != user_context.get("expertise_level"): audience_score = 0.85 # slight penalty for mismatched audience
# User feedback signal (trust but normalize) vote_boost = min(1.0 + signals.relevance_votes * 0.05, 1.3)
# Language match lang_score = 1.0 if signals.language == user_context.get("language", "en") else 0.7
# Weighted combination final_score = ( 0.60 * score # semantic similarity (dominant signal) + 0.15 * recency # freshness + 0.10 * version_penalty + 0.05 * audience_score + 0.05 * vote_boost + 0.05 * lang_score ) return final_scoreMandatory Hard Filters vs Soft Scoring
Distinguish between signals that are hard constraints (exclude documents) and those that are soft scores (affect ranking):
Hard constraints (always exclude): - Document is confidential and user doesn't have access - Document is from a different tenant - Document language doesn't match user language (if strict)
Soft scores (affect ranking, not exclusion): - Recency (recent is better, old is not necessarily wrong) - Source credibility - Audience match - User feedback votesHard filtering should happen before vector search (at the database layer via metadata filters). Soft scoring happens after retrieval, in the application layer.
def metadata_aware_retrieval( query: str, user: dict, vectorstore, k: int = 10, over_fetch: int = 30, # fetch more to allow reranking) -> list: # Hard filters — applied at vector store level hard_filter = { "tenant_id": user["tenant_id"], "access_level": {"$in": user["access_levels"]}, "status": {"$ne": "archived"}, }
# Fetch more candidates to allow metadata-based reranking candidates = vectorstore.similarity_search_with_score( query, k=over_fetch, filter=hard_filter, )
# Apply soft metadata scoring reranked = [] for doc, semantic_score in candidates: signals = MetadataSignals( semantic_score=semantic_score, created_at=doc.metadata["created_at"], source_type=doc.metadata["source_type"], doc_version=doc.metadata.get("version", "current"), relevance_votes=doc.metadata.get("votes", 0), language=doc.metadata.get("language", "en"), target_audience=doc.metadata.get("audience", "general"), ) final_score = metadata_aware_score(signals, user) reranked.append((doc, final_score))
reranked.sort(key=lambda x: x[1], reverse=True) return [doc for doc, _ in reranked[:k]]Temporal Query Understanding
Some queries are inherently time-scoped and should trigger recency weighting automatically:
TEMPORAL_SIGNAL_KEYWORDS = [ "current", "latest", "recent", "new", "updated", "now", "today", "this year", "2024", "2025", "modern", "state of the art"]
HISTORICAL_SIGNAL_KEYWORDS = [ "history", "original", "first", "when was", "historically", "in the past", "legacy", "classic"]
def detect_temporal_intent(query: str) -> str: query_lower = query.lower() if any(kw in query_lower for kw in TEMPORAL_SIGNAL_KEYWORDS): return "recency_boosted" elif any(kw in query_lower for kw in HISTORICAL_SIGNAL_KEYWORDS): return "historical" return "neutral"
# Adjust recency_weight based on detected intentintent = detect_temporal_intent("What's the current best practice for RAG chunking?")recency_weight = {"recency_boosted": 0.40, "historical": 0.05, "neutral": 0.15}[intent]2025 Trend: Implicit User Context Signals
Production systems increasingly incorporate implicit user context into metadata scoring — what the user has been reading recently, their stated expertise level, their organization’s active projects. A user in the “backend engineering” team asking about APIs gets internal API documentation prioritized over external user guides, even when semantic similarity scores are equal.
This personalization layer is thin but meaningful — a lightweight metadata boost based on user profile attributes that costs nothing at query time since it’s just score arithmetic.
Metadata-aware retrieval is where RAG becomes smart about the context of information, not just the content. For any production knowledge base where documents have meaningful metadata, integrating these signals lifts retrieval quality measurably.