Self-Query Retrieval: Let the LLM Build the Filter
“Show me Q3 2024 financial reports from the Europe team.”
A standard semantic search treats this as a single query and tries to find similar documents. But this sentence contains both a semantic component (“financial reports”) and structural components — explicit metadata constraints: quarter=Q3, year=2024, department=Europe.
Self-query retrieval uses an LLM to parse these structured constraints out of the natural language query, constructs a metadata filter programmatically, and then runs both the filter and the remaining semantic search together. The result is far more precise than semantic search alone.
How It Works
User query: "Find legal contracts from 2024 with less than 50 pages"
Self-query LLM output: semantic_query: "legal contracts" filters: { "document_type": "contract", "year": { "$gte": 2024, "$lte": 2024 }, "page_count": { "$lt": 50 } }
Retrieval: 1. Apply metadata filter to narrow candidate set 2. Run semantic search within filtered set 3. Return results that satisfy both criteriaThe magic is in the structured extraction — the LLM understands “less than 50 pages” means page_count < 50 and “from 2024” means year = 2024.
Metadata Schema Definition
The self-query LLM needs to know what attributes are available to filter on. You define this as an attribute description schema:
from langchain.chains.query_constructor.base import AttributeInfo
metadata_field_info = [ AttributeInfo( name="source", description="The filename or URL of the source document", type="string", ), AttributeInfo( name="document_type", description="Type of document: contract, report, policy, email, invoice", type="string", ), AttributeInfo( name="department", description="The department that created the document: engineering, legal, finance, marketing, HR", type="string", ), AttributeInfo( name="year", description="Year the document was created (integer, e.g. 2024)", type="integer", ), AttributeInfo( name="quarter", description="Fiscal quarter: Q1, Q2, Q3, Q4", type="string", ), AttributeInfo( name="confidential", description="Whether the document is confidential (true/false)", type="boolean", ), AttributeInfo( name="page_count", description="Number of pages in the document", type="integer", ),]
document_content_description = "Company internal documents including contracts, reports, policies, and emails"LangChain SelfQueryRetriever
from langchain.retrievers.self_query.base import SelfQueryRetrieverfrom langchain_openai import ChatOpenAI, OpenAIEmbeddingsfrom langchain_community.vectorstores import Chroma
embeddings = OpenAIEmbeddings()vectorstore = Chroma.from_documents(documents, embeddings)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
retriever = SelfQueryRetriever.from_llm( llm=llm, vectorstore=vectorstore, document_contents=document_content_description, metadata_field_info=metadata_field_info, verbose=True, # shows the generated query and filters enable_limit=True, # allow "give me 5 documents" type queries)
# Test queriesresult = retriever.invoke( "Find non-confidential engineering documents from Q1 2024")
# Verbose output shows:# semantic_query: "engineering documents"# filter: {"department": "engineering", "year": 2024, "quarter": "Q1", "confidential": False}The Query Constructor Chain
Under the hood, SelfQueryRetriever uses a QueryConstructorChain that prompts the LLM with the metadata schema and asks it to decompose the query:
from langchain.chains.query_constructor.base import ( StructuredQueryOutputParser, get_query_constructor_prompt,)from langchain_openai import ChatOpenAI
prompt = get_query_constructor_prompt( document_content_description, metadata_field_info,)
output_parser = StructuredQueryOutputParser.from_components()query_constructor = prompt | ChatOpenAI(temperature=0) | output_parser
# Test the constructor directlystructured_query = query_constructor.invoke({ "query": "legal contracts signed after June 2023"})
print(structured_query.query) # "legal contracts"print(structured_query.filter) # Comparison(attribute="year", comparator=GT, value=2023)Handling Operator Types
Different vector stores support different filter operators. LangChain’s self-query layer translates the abstract filter AST to database-specific syntax:
# Abstract filter from LLM:# AND(# EQ("department", "legal"),# GTE("year", 2023),# NOT(EQ("confidential", True))# )
# Translated to Qdrant:from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
qdrant_filter = Filter( must=[ FieldCondition(key="department", match=MatchValue(value="legal")), FieldCondition(key="year", range=Range(gte=2023)), ], must_not=[ FieldCondition(key="confidential", match=MatchValue(value=True)), ])
# Translated to Chroma:chroma_filter = { "$and": [ {"department": {"$eq": "legal"}}, {"year": {"$gte": 2023}}, {"confidential": {"$ne": True}}, ]}LangChain handles these translations automatically for supported vector stores: Chroma, Pinecone, Weaviate, Qdrant, Milvus, MongoDB Atlas.
Building a Custom Self-Query System
For more control, build the structured extraction yourself:
from pydantic import BaseModel, Fieldfrom typing import Optional
class DocumentFilter(BaseModel): semantic_query: str = Field(description="The semantic content to search for") document_type: Optional[str] = Field(default=None, description="Type of document") department: Optional[str] = Field(default=None, description="Department") year_min: Optional[int] = Field(default=None, description="Minimum year") year_max: Optional[int] = Field(default=None, description="Maximum year") confidential: Optional[bool] = Field(default=None)
def parse_self_query(natural_language_query: str) -> DocumentFilter: import anthropic import json
client = anthropic.Anthropic() response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=300, tools=[{ "name": "extract_filters", "description": "Extract structured filters from a document search query", "input_schema": DocumentFilter.model_json_schema(), }], messages=[{"role": "user", "content": natural_language_query}], )
tool_use = next(b for b in response.content if b.type == "tool_use") return DocumentFilter(**tool_use.input)Common Self-Query Failure Modes
Hallucinated attribute values: LLM invents filter values not in your schema. Validate all extracted values against an allowlist:
VALID_DEPARTMENTS = {"engineering", "legal", "finance", "marketing", "hr"}
def validate_filter(filter_dict: dict) -> dict: if "department" in filter_dict: if filter_dict["department"] not in VALID_DEPARTMENTS: del filter_dict["department"] # drop invalid filter, don't fail return filter_dictOver-filtering: LLM extracts too many constraints, reducing recall to zero. Implement a fallback: if filtered results < 3, drop most selective filters and retry.
def safe_self_query(query, retriever, min_results=3): results = retriever.invoke(query) if len(results) < min_results: # Fallback to pure semantic search return vectorstore.similarity_search(query, k=10) return results2025 Trend: Schema-Free Self-Query
Emerging approaches use LLMs to infer the metadata schema from examples of existing documents rather than requiring manual schema definition. The LLM reads a sample of documents, infers what structured attributes are available, and generates its own attribute descriptions. This dramatically reduces setup time for new document corpora.
Self-query retrieval is the bridge between natural language interfaces and structured data. When your users naturally express constraints (“from last year”, “under 10 pages”, “by the legal team”), self-query automatically translates those constraints into precise filters — no query parsing code required.