Self-Query Retrieval: Natural Language to Metadata Filters in RAG

Learn self-query retrieval for RAG — using LLMs to auto-generate metadata filters from natural language, enabling precise structured and semantic search.

Self-Query Retrieval: Let the LLM Build the Filter

“Show me Q3 2024 financial reports from the Europe team.”

A standard semantic search treats this as a single query and tries to find similar documents. But this sentence contains both a semantic component (“financial reports”) and structural components — explicit metadata constraints: quarter=Q3, year=2024, department=Europe.

Self-query retrieval uses an LLM to parse these structured constraints out of the natural language query, constructs a metadata filter programmatically, and then runs both the filter and the remaining semantic search together. The result is far more precise than semantic search alone.

How It Works

User query: "Find legal contracts from 2024 with less than 50 pages"
Self-query LLM output:
semantic_query: "legal contracts"
filters: {
"document_type": "contract",
"year": { "$gte": 2024, "$lte": 2024 },
"page_count": { "$lt": 50 }
}
Retrieval:
1. Apply metadata filter to narrow candidate set
2. Run semantic search within filtered set
3. Return results that satisfy both criteria

The magic is in the structured extraction — the LLM understands “less than 50 pages” means page_count < 50 and “from 2024” means year = 2024.

Metadata Schema Definition

The self-query LLM needs to know what attributes are available to filter on. You define this as an attribute description schema:

from langchain.chains.query_constructor.base import AttributeInfo
metadata_field_info = [
AttributeInfo(
name="source",
description="The filename or URL of the source document",
type="string",
),
AttributeInfo(
name="document_type",
description="Type of document: contract, report, policy, email, invoice",
type="string",
),
AttributeInfo(
name="department",
description="The department that created the document: engineering, legal, finance, marketing, HR",
type="string",
),
AttributeInfo(
name="year",
description="Year the document was created (integer, e.g. 2024)",
type="integer",
),
AttributeInfo(
name="quarter",
description="Fiscal quarter: Q1, Q2, Q3, Q4",
type="string",
),
AttributeInfo(
name="confidential",
description="Whether the document is confidential (true/false)",
type="boolean",
),
AttributeInfo(
name="page_count",
description="Number of pages in the document",
type="integer",
),
]
document_content_description = "Company internal documents including contracts, reports, policies, and emails"

LangChain SelfQueryRetriever

from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
retriever = SelfQueryRetriever.from_llm(
llm=llm,
vectorstore=vectorstore,
document_contents=document_content_description,
metadata_field_info=metadata_field_info,
verbose=True, # shows the generated query and filters
enable_limit=True, # allow "give me 5 documents" type queries
)
# Test queries
result = retriever.invoke(
"Find non-confidential engineering documents from Q1 2024"
)
# Verbose output shows:
# semantic_query: "engineering documents"
# filter: {"department": "engineering", "year": 2024, "quarter": "Q1", "confidential": False}

The Query Constructor Chain

Under the hood, SelfQueryRetriever uses a QueryConstructorChain that prompts the LLM with the metadata schema and asks it to decompose the query:

from langchain.chains.query_constructor.base import (
StructuredQueryOutputParser,
get_query_constructor_prompt,
)
from langchain_openai import ChatOpenAI
prompt = get_query_constructor_prompt(
document_content_description,
metadata_field_info,
)
output_parser = StructuredQueryOutputParser.from_components()
query_constructor = prompt | ChatOpenAI(temperature=0) | output_parser
# Test the constructor directly
structured_query = query_constructor.invoke({
"query": "legal contracts signed after June 2023"
})
print(structured_query.query) # "legal contracts"
print(structured_query.filter) # Comparison(attribute="year", comparator=GT, value=2023)

Handling Operator Types

Different vector stores support different filter operators. LangChain’s self-query layer translates the abstract filter AST to database-specific syntax:

# Abstract filter from LLM:
# AND(
# EQ("department", "legal"),
# GTE("year", 2023),
# NOT(EQ("confidential", True))
# )
# Translated to Qdrant:
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
qdrant_filter = Filter(
must=[
FieldCondition(key="department", match=MatchValue(value="legal")),
FieldCondition(key="year", range=Range(gte=2023)),
],
must_not=[
FieldCondition(key="confidential", match=MatchValue(value=True)),
]
)
# Translated to Chroma:
chroma_filter = {
"$and": [
{"department": {"$eq": "legal"}},
{"year": {"$gte": 2023}},
{"confidential": {"$ne": True}},
]
}

LangChain handles these translations automatically for supported vector stores: Chroma, Pinecone, Weaviate, Qdrant, Milvus, MongoDB Atlas.

Building a Custom Self-Query System

For more control, build the structured extraction yourself:

from pydantic import BaseModel, Field
from typing import Optional
class DocumentFilter(BaseModel):
semantic_query: str = Field(description="The semantic content to search for")
document_type: Optional[str] = Field(default=None, description="Type of document")
department: Optional[str] = Field(default=None, description="Department")
year_min: Optional[int] = Field(default=None, description="Minimum year")
year_max: Optional[int] = Field(default=None, description="Maximum year")
confidential: Optional[bool] = Field(default=None)
def parse_self_query(natural_language_query: str) -> DocumentFilter:
import anthropic
import json
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=300,
tools=[{
"name": "extract_filters",
"description": "Extract structured filters from a document search query",
"input_schema": DocumentFilter.model_json_schema(),
}],
messages=[{"role": "user", "content": natural_language_query}],
)
tool_use = next(b for b in response.content if b.type == "tool_use")
return DocumentFilter(**tool_use.input)

Common Self-Query Failure Modes

Hallucinated attribute values: LLM invents filter values not in your schema. Validate all extracted values against an allowlist:

VALID_DEPARTMENTS = {"engineering", "legal", "finance", "marketing", "hr"}
def validate_filter(filter_dict: dict) -> dict:
if "department" in filter_dict:
if filter_dict["department"] not in VALID_DEPARTMENTS:
del filter_dict["department"] # drop invalid filter, don't fail
return filter_dict

Over-filtering: LLM extracts too many constraints, reducing recall to zero. Implement a fallback: if filtered results < 3, drop most selective filters and retry.

def safe_self_query(query, retriever, min_results=3):
results = retriever.invoke(query)
if len(results) < min_results:
# Fallback to pure semantic search
return vectorstore.similarity_search(query, k=10)
return results

2025 Trend: Schema-Free Self-Query

Emerging approaches use LLMs to infer the metadata schema from examples of existing documents rather than requiring manual schema definition. The LLM reads a sample of documents, infers what structured attributes are available, and generates its own attribute descriptions. This dramatically reduces setup time for new document corpora.

Self-query retrieval is the bridge between natural language interfaces and structured data. When your users naturally express constraints (“from last year”, “under 10 pages”, “by the legal team”), self-query automatically translates those constraints into precise filters — no query parsing code required.