Self-Query Retrieval: Let the LLM Build the Filter

“Show me Q3 2024 financial reports from the Europe team.”

A standard semantic search treats this as a single query and tries to find similar documents. But this sentence contains both a semantic component (“financial reports”) and structural components — explicit metadata constraints: quarter=Q3, year=2024, department=Europe.

Self-query retrieval uses an LLM to parse these structured constraints out of the natural language query, constructs a metadata filter programmatically, and then runs both the filter and the remaining semantic search together. The result is far more precise than semantic search alone.

How It Works

User query: "Find legal contracts from 2024 with less than 50 pages"

Self-query LLM output:
  semantic_query: "legal contracts"
  filters: {
    "document_type": "contract",
    "year": { "$gte": 2024, "$lte": 2024 },
    "page_count": { "$lt": 50 }
  }

Retrieval:
  1. Apply metadata filter to narrow candidate set
  2. Run semantic search within filtered set
  3. Return results that satisfy both criteria

The magic is in the structured extraction — the LLM understands “less than 50 pages” means page_count < 50 and “from 2024” means year = 2024.

Metadata Schema Definition

The self-query LLM needs to know what attributes are available to filter on. You define this as an attribute description schema:

from langchain.chains.query_constructor.base import AttributeInfo

metadata_field_info = [
    AttributeInfo(
        name="source",
        description="The filename or URL of the source document",
        type="string",
    ),
    AttributeInfo(
        name="document_type",
        description="Type of document: contract, report, policy, email, invoice",
        type="string",
    ),
    AttributeInfo(
        name="department",
        description="The department that created the document: engineering, legal, finance, marketing, HR",
        type="string",
    ),
    AttributeInfo(
        name="year",
        description="Year the document was created (integer, e.g. 2024)",
        type="integer",
    ),
    AttributeInfo(
        name="quarter",
        description="Fiscal quarter: Q1, Q2, Q3, Q4",
        type="string",
    ),
    AttributeInfo(
        name="confidential",
        description="Whether the document is confidential (true/false)",
        type="boolean",
    ),
    AttributeInfo(
        name="page_count",
        description="Number of pages in the document",
        type="integer",
    ),
]

document_content_description = "Company internal documents including contracts, reports, policies, and emails"

LangChain SelfQueryRetriever

from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

retriever = SelfQueryRetriever.from_llm(
    llm=llm,
    vectorstore=vectorstore,
    document_contents=document_content_description,
    metadata_field_info=metadata_field_info,
    verbose=True,  # shows the generated query and filters
    enable_limit=True,  # allow "give me 5 documents" type queries
)

# Test queries
result = retriever.invoke(
    "Find non-confidential engineering documents from Q1 2024"
)

# Verbose output shows:
# semantic_query: "engineering documents"
# filter: {"department": "engineering", "year": 2024, "quarter": "Q1", "confidential": False}

The Query Constructor Chain

Under the hood, SelfQueryRetriever uses a QueryConstructorChain that prompts the LLM with the metadata schema and asks it to decompose the query:

from langchain.chains.query_constructor.base import (
    StructuredQueryOutputParser,
    get_query_constructor_prompt,
)
from langchain_openai import ChatOpenAI

prompt = get_query_constructor_prompt(
    document_content_description,
    metadata_field_info,
)

output_parser = StructuredQueryOutputParser.from_components()
query_constructor = prompt | ChatOpenAI(temperature=0) | output_parser

# Test the constructor directly
structured_query = query_constructor.invoke({
    "query": "legal contracts signed after June 2023"
})

print(structured_query.query)   # "legal contracts"
print(structured_query.filter)  # Comparison(attribute="year", comparator=GT, value=2023)

Handling Operator Types

Different vector stores support different filter operators. LangChain’s self-query layer translates the abstract filter AST to database-specific syntax:

# Abstract filter from LLM:
# AND(
#   EQ("department", "legal"),
#   GTE("year", 2023),
#   NOT(EQ("confidential", True))
# )

# Translated to Qdrant:
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

qdrant_filter = Filter(
    must=[
        FieldCondition(key="department", match=MatchValue(value="legal")),
        FieldCondition(key="year", range=Range(gte=2023)),
    ],
    must_not=[
        FieldCondition(key="confidential", match=MatchValue(value=True)),
    ]
)

# Translated to Chroma:
chroma_filter = {
    "$and": [
        {"department": {"$eq": "legal"}},
        {"year": {"$gte": 2023}},
        {"confidential": {"$ne": True}},
    ]
}

LangChain handles these translations automatically for supported vector stores: Chroma, Pinecone, Weaviate, Qdrant, Milvus, MongoDB Atlas.

Building a Custom Self-Query System

For more control, build the structured extraction yourself:

from pydantic import BaseModel, Field
from typing import Optional

class DocumentFilter(BaseModel):
    semantic_query: str = Field(description="The semantic content to search for")
    document_type: Optional[str] = Field(default=None, description="Type of document")
    department: Optional[str] = Field(default=None, description="Department")
    year_min: Optional[int] = Field(default=None, description="Minimum year")
    year_max: Optional[int] = Field(default=None, description="Maximum year")
    confidential: Optional[bool] = Field(default=None)

def parse_self_query(natural_language_query: str) -> DocumentFilter:
    import anthropic
    import json

    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=300,
        tools=[{
            "name": "extract_filters",
            "description": "Extract structured filters from a document search query",
            "input_schema": DocumentFilter.model_json_schema(),
        }],
        messages=[{"role": "user", "content": natural_language_query}],
    )

    tool_use = next(b for b in response.content if b.type == "tool_use")
    return DocumentFilter(**tool_use.input)

Common Self-Query Failure Modes

Hallucinated attribute values: LLM invents filter values not in your schema. Validate all extracted values against an allowlist:

VALID_DEPARTMENTS = {"engineering", "legal", "finance", "marketing", "hr"}

def validate_filter(filter_dict: dict) -> dict:
    if "department" in filter_dict:
        if filter_dict["department"] not in VALID_DEPARTMENTS:
            del filter_dict["department"]  # drop invalid filter, don't fail
    return filter_dict

Over-filtering: LLM extracts too many constraints, reducing recall to zero. Implement a fallback: if filtered results < 3, drop most selective filters and retry.

def safe_self_query(query, retriever, min_results=3):
    results = retriever.invoke(query)
    if len(results) < min_results:
        # Fallback to pure semantic search
        return vectorstore.similarity_search(query, k=10)
    return results

2025 Trend: Schema-Free Self-Query

Emerging approaches use LLMs to infer the metadata schema from examples of existing documents rather than requiring manual schema definition. The LLM reads a sample of documents, infers what structured attributes are available, and generates its own attribute descriptions. This dramatically reduces setup time for new document corpora.

Self-query retrieval is the bridge between natural language interfaces and structured data. When your users naturally express constraints (“from last year”, “under 10 pages”, “by the legal team”), self-query automatically translates those constraints into precise filters — no query parsing code required.