Parent-Child Chunking: Precision Retrieval with Full Context

Learn parent-child chunking for RAG — retrieve with small child chunks for precision, return large parent chunks for rich LLM context generation.

Parent-Child Chunking: Retrieve Small, Answer Big

Here’s the tension at the heart of RAG chunking: small chunks give you precise retrieval (the embedding of 100 tokens is sharper, more focused), but they give your LLM too little context to generate a good answer. Large chunks give the LLM plenty of context but embed poorly — the embedding becomes a diffuse average of multiple topics and matches fewer queries accurately.

Parent-child chunking resolves this tension by separating the retrieval representation from the generation context. You index small “child” chunks for high-precision retrieval, but when a child chunk is retrieved, you look up its “parent” — a larger, richer passage — and send that to the LLM instead.

The Architecture in Plain Terms

Document
├── Parent Chunk A (1,000 tokens)
│ ├── Child A1 (150 tokens) ←── embedded + indexed
│ ├── Child A2 (150 tokens) ←── embedded + indexed
│ ├── Child A3 (150 tokens) ←── embedded + indexed
│ └── Child A4 (150 tokens) ←── embedded + indexed
└── Parent Chunk B (1,000 tokens)
├── Child B1 (150 tokens) ←── embedded + indexed
├── Child B2 (150 tokens) ←── embedded + indexed
└── Child B3 (150 tokens) ←── embedded + indexed
Query arrives → vector search finds Child A2 (best match)
→ retrieve Parent A (the full 1,000-token section)
→ send Parent A to LLM for generation

The child chunks live in your vector store. The parent chunks live in a document store (a simple key-value store, PostgreSQL, or an in-memory dict during prototyping). Metadata in each child chunk stores a pointer back to its parent.

Implementation with LangChain

LangChain provides ParentDocumentRetriever which handles this pattern out of the box:

from langchain.storage import InMemoryByteStore
from langchain.retrievers import ParentDocumentRetriever
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
# Child splitter: small chunks for embedding
child_splitter = RecursiveCharacterTextSplitter(
chunk_size=200,
chunk_overlap=20,
)
# Parent splitter: larger chunks for LLM context
parent_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=100,
)
vectorstore = Chroma(
embedding_function=OpenAIEmbeddings(),
)
docstore = InMemoryByteStore()
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=docstore,
child_splitter=child_splitter,
parent_splitter=parent_splitter,
)
# Add documents — automatically creates both parent and child chunks
retriever.add_documents(documents)
# At retrieval time — automatically returns parent chunks
results = retriever.invoke("What is the refund policy?")

The retriever handles the pointer resolution transparently. You retrieve child chunks, get parent chunks back.

Multi-Level Hierarchies

Nothing limits this pattern to two levels. Complex corpora often benefit from three levels:

Level 0: Full document (5,000+ tokens)
↓ split
Level 1: Section chunks (800–1,200 tokens) — sent to LLM
↓ split
Level 2: Paragraph chunks (150–250 tokens) — embedded and indexed
Query → find best Level 2 chunk → look up Level 1 parent → send to LLM

For very large documents (books, lengthy legal contracts), you might even add a Level 0 summary that the LLM can reference for overall document context.

Choosing Parent and Child Sizes

The child chunk size should be small enough that its embedding is sharply focused. A good guideline:

  • Child: 100–200 tokens for precise factual queries
  • Child: 200–400 tokens when queries are more open-ended

The parent chunk size depends on your LLM’s context window and how many parents you’ll concatenate:

LLM context: 8,000 tokens
Retrieve top 3 parents: 3 × 1,000 = 3,000 tokens for retrieved context
Remaining: 5,000 tokens for system prompt + query + response
LLM context: 32,000 tokens
Retrieve top 5 parents: 5 × 2,000 = 10,000 tokens for retrieved context
Remaining: 22,000 tokens for everything else

Match your parent chunk size to your retrieval count and LLM context budget.

Document-Level Parents: The “Full Document” Variant

A common variation uses the full original document as the parent and only creates child chunks for indexing. This works well when:

  • Documents are short (< 2,000 tokens) and can fit in context
  • You want the LLM to have complete context for every retrieved document
  • Your use case involves synthesizing across an entire document rather than extracting a specific fact
# Full document as parent (no parent_splitter)
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=docstore,
child_splitter=child_splitter,
# parent_splitter omitted → full document is the parent
)

Metadata Propagation

Child chunks should inherit metadata from their parent document. This is critical for downstream filtering and citation:

from langchain.schema import Document
def create_child_with_metadata(parent_doc, child_text, child_index):
return Document(
page_content=child_text,
metadata={
**parent_doc.metadata, # inherit source, date, author
"parent_id": parent_doc.metadata["doc_id"],
"child_index": child_index,
}
)

When you retrieve a child and look up its parent, the parent document will also carry this metadata, making it easy to surface citations like “From document: Annual Report 2024, Section 3.”

2025 Trend: Contextual Chunk Headers

An emerging enhancement is adding a generated summary of the parent context as a header to each child chunk before embedding. This solves the out-of-context problem where child chunks are ambiguous without knowing what section they came from:

import anthropic
def add_context_header(parent_text: str, child_text: str) -> str:
client = anthropic.Anthropic()
summary = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=100,
messages=[{
"role": "user",
"content": f"Summarize the context of this passage in one sentence:\n\n{parent_text[:500]}"
}]
).content[0].text
return f"[Context: {summary}]\n\n{child_text}"

This technique (popularized by Anthropic’s contextual retrieval research) can improve retrieval recall by 20–40% on ambiguous queries.

When Parent-Child Chunking Pays Off

The overhead is worth it when:

  • Your queries require context that spans multiple small chunks
  • You’re seeing “correct retrieval, bad generation” failures — the right chunk is found but the answer is incomplete
  • Your documents have clear hierarchical structure (articles with sections, contracts with clauses)
  • You have the storage budget for both a vector store and a document store

Skip it when:

  • Documents are already short enough to use as-is
  • Retrieval precision is your bottleneck (not generation quality)
  • You need to minimize operational complexity

Parent-child chunking is the architecture that makes RAG feel like it actually understands your documents rather than just pattern-matching against fragments.