Parent-Child Chunking: Retrieve Small, Answer Big

Here’s the tension at the heart of RAG chunking: small chunks give you precise retrieval (the embedding of 100 tokens is sharper, more focused), but they give your LLM too little context to generate a good answer. Large chunks give the LLM plenty of context but embed poorly — the embedding becomes a diffuse average of multiple topics and matches fewer queries accurately.

Parent-child chunking resolves this tension by separating the retrieval representation from the generation context. You index small “child” chunks for high-precision retrieval, but when a child chunk is retrieved, you look up its “parent” — a larger, richer passage — and send that to the LLM instead.

The Architecture in Plain Terms

Document
│
├── Parent Chunk A (1,000 tokens)
│     ├── Child A1 (150 tokens)  ←── embedded + indexed
│     ├── Child A2 (150 tokens)  ←── embedded + indexed
│     ├── Child A3 (150 tokens)  ←── embedded + indexed
│     └── Child A4 (150 tokens)  ←── embedded + indexed
│
└── Parent Chunk B (1,000 tokens)
      ├── Child B1 (150 tokens)  ←── embedded + indexed
      ├── Child B2 (150 tokens)  ←── embedded + indexed
      └── Child B3 (150 tokens)  ←── embedded + indexed

Query arrives → vector search finds Child A2 (best match)
               → retrieve Parent A (the full 1,000-token section)
               → send Parent A to LLM for generation

The child chunks live in your vector store. The parent chunks live in a document store (a simple key-value store, PostgreSQL, or an in-memory dict during prototyping). Metadata in each child chunk stores a pointer back to its parent.

Implementation with LangChain

LangChain provides ParentDocumentRetriever which handles this pattern out of the box:

from langchain.storage import InMemoryByteStore
from langchain.retrievers import ParentDocumentRetriever
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Child splitter: small chunks for embedding
child_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=20,
)

# Parent splitter: larger chunks for LLM context
parent_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100,
)

vectorstore = Chroma(
    embedding_function=OpenAIEmbeddings(),
)
docstore = InMemoryByteStore()

retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=docstore,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

# Add documents — automatically creates both parent and child chunks
retriever.add_documents(documents)

# At retrieval time — automatically returns parent chunks
results = retriever.invoke("What is the refund policy?")

The retriever handles the pointer resolution transparently. You retrieve child chunks, get parent chunks back.

Multi-Level Hierarchies

Nothing limits this pattern to two levels. Complex corpora often benefit from three levels:

Level 0: Full document (5,000+ tokens)
           ↓ split
Level 1: Section chunks (800–1,200 tokens) — sent to LLM
           ↓ split
Level 2: Paragraph chunks (150–250 tokens) — embedded and indexed

Query → find best Level 2 chunk → look up Level 1 parent → send to LLM

For very large documents (books, lengthy legal contracts), you might even add a Level 0 summary that the LLM can reference for overall document context.

Choosing Parent and Child Sizes

The child chunk size should be small enough that its embedding is sharply focused. A good guideline:

Child: 100–200 tokens for precise factual queries
Child: 200–400 tokens when queries are more open-ended

The parent chunk size depends on your LLM’s context window and how many parents you’ll concatenate:

LLM context: 8,000 tokens
Retrieve top 3 parents: 3 × 1,000 = 3,000 tokens for retrieved context
Remaining: 5,000 tokens for system prompt + query + response

LLM context: 32,000 tokens
Retrieve top 5 parents: 5 × 2,000 = 10,000 tokens for retrieved context
Remaining: 22,000 tokens for everything else

Match your parent chunk size to your retrieval count and LLM context budget.

Document-Level Parents: The “Full Document” Variant

A common variation uses the full original document as the parent and only creates child chunks for indexing. This works well when:

Documents are short (< 2,000 tokens) and can fit in context
You want the LLM to have complete context for every retrieved document
Your use case involves synthesizing across an entire document rather than extracting a specific fact

# Full document as parent (no parent_splitter)
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=docstore,
    child_splitter=child_splitter,
    # parent_splitter omitted → full document is the parent
)

Metadata Propagation

Child chunks should inherit metadata from their parent document. This is critical for downstream filtering and citation:

from langchain.schema import Document

def create_child_with_metadata(parent_doc, child_text, child_index):
    return Document(
        page_content=child_text,
        metadata={
            **parent_doc.metadata,          # inherit source, date, author
            "parent_id": parent_doc.metadata["doc_id"],
            "child_index": child_index,
        }
    )

When you retrieve a child and look up its parent, the parent document will also carry this metadata, making it easy to surface citations like “From document: Annual Report 2024, Section 3.”

2025 Trend: Contextual Chunk Headers

An emerging enhancement is adding a generated summary of the parent context as a header to each child chunk before embedding. This solves the out-of-context problem where child chunks are ambiguous without knowing what section they came from:

import anthropic

def add_context_header(parent_text: str, child_text: str) -> str:
    client = anthropic.Anthropic()
    summary = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=100,
        messages=[{
            "role": "user",
            "content": f"Summarize the context of this passage in one sentence:\n\n{parent_text[:500]}"
        }]
    ).content[0].text
    return f"[Context: {summary}]\n\n{child_text}"

This technique (popularized by Anthropic’s contextual retrieval research) can improve retrieval recall by 20–40% on ambiguous queries.

When Parent-Child Chunking Pays Off

The overhead is worth it when:

Your queries require context that spans multiple small chunks
You’re seeing “correct retrieval, bad generation” failures — the right chunk is found but the answer is incomplete
Your documents have clear hierarchical structure (articles with sections, contracts with clauses)
You have the storage budget for both a vector store and a document store

Skip it when:

Documents are already short enough to use as-is
Retrieval precision is your bottleneck (not generation quality)
You need to minimize operational complexity

Parent-child chunking is the architecture that makes RAG feel like it actually understands your documents rather than just pattern-matching against fragments.