Parent-Child Chunking: Retrieve Small, Answer Big
Here’s the tension at the heart of RAG chunking: small chunks give you precise retrieval (the embedding of 100 tokens is sharper, more focused), but they give your LLM too little context to generate a good answer. Large chunks give the LLM plenty of context but embed poorly — the embedding becomes a diffuse average of multiple topics and matches fewer queries accurately.
Parent-child chunking resolves this tension by separating the retrieval representation from the generation context. You index small “child” chunks for high-precision retrieval, but when a child chunk is retrieved, you look up its “parent” — a larger, richer passage — and send that to the LLM instead.
The Architecture in Plain Terms
Document│├── Parent Chunk A (1,000 tokens)│ ├── Child A1 (150 tokens) ←── embedded + indexed│ ├── Child A2 (150 tokens) ←── embedded + indexed│ ├── Child A3 (150 tokens) ←── embedded + indexed│ └── Child A4 (150 tokens) ←── embedded + indexed│└── Parent Chunk B (1,000 tokens) ├── Child B1 (150 tokens) ←── embedded + indexed ├── Child B2 (150 tokens) ←── embedded + indexed └── Child B3 (150 tokens) ←── embedded + indexed
Query arrives → vector search finds Child A2 (best match) → retrieve Parent A (the full 1,000-token section) → send Parent A to LLM for generationThe child chunks live in your vector store. The parent chunks live in a document store (a simple key-value store, PostgreSQL, or an in-memory dict during prototyping). Metadata in each child chunk stores a pointer back to its parent.
Implementation with LangChain
LangChain provides ParentDocumentRetriever which handles this pattern out of the box:
from langchain.storage import InMemoryByteStorefrom langchain.retrievers import ParentDocumentRetrieverfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain_community.vectorstores import Chromafrom langchain_openai import OpenAIEmbeddings
# Child splitter: small chunks for embeddingchild_splitter = RecursiveCharacterTextSplitter( chunk_size=200, chunk_overlap=20,)
# Parent splitter: larger chunks for LLM contextparent_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=100,)
vectorstore = Chroma( embedding_function=OpenAIEmbeddings(),)docstore = InMemoryByteStore()
retriever = ParentDocumentRetriever( vectorstore=vectorstore, docstore=docstore, child_splitter=child_splitter, parent_splitter=parent_splitter,)
# Add documents — automatically creates both parent and child chunksretriever.add_documents(documents)
# At retrieval time — automatically returns parent chunksresults = retriever.invoke("What is the refund policy?")The retriever handles the pointer resolution transparently. You retrieve child chunks, get parent chunks back.
Multi-Level Hierarchies
Nothing limits this pattern to two levels. Complex corpora often benefit from three levels:
Level 0: Full document (5,000+ tokens) ↓ splitLevel 1: Section chunks (800–1,200 tokens) — sent to LLM ↓ splitLevel 2: Paragraph chunks (150–250 tokens) — embedded and indexed
Query → find best Level 2 chunk → look up Level 1 parent → send to LLMFor very large documents (books, lengthy legal contracts), you might even add a Level 0 summary that the LLM can reference for overall document context.
Choosing Parent and Child Sizes
The child chunk size should be small enough that its embedding is sharply focused. A good guideline:
- Child: 100–200 tokens for precise factual queries
- Child: 200–400 tokens when queries are more open-ended
The parent chunk size depends on your LLM’s context window and how many parents you’ll concatenate:
LLM context: 8,000 tokensRetrieve top 3 parents: 3 × 1,000 = 3,000 tokens for retrieved contextRemaining: 5,000 tokens for system prompt + query + response
LLM context: 32,000 tokensRetrieve top 5 parents: 5 × 2,000 = 10,000 tokens for retrieved contextRemaining: 22,000 tokens for everything elseMatch your parent chunk size to your retrieval count and LLM context budget.
Document-Level Parents: The “Full Document” Variant
A common variation uses the full original document as the parent and only creates child chunks for indexing. This works well when:
- Documents are short (< 2,000 tokens) and can fit in context
- You want the LLM to have complete context for every retrieved document
- Your use case involves synthesizing across an entire document rather than extracting a specific fact
# Full document as parent (no parent_splitter)retriever = ParentDocumentRetriever( vectorstore=vectorstore, docstore=docstore, child_splitter=child_splitter, # parent_splitter omitted → full document is the parent)Metadata Propagation
Child chunks should inherit metadata from their parent document. This is critical for downstream filtering and citation:
from langchain.schema import Document
def create_child_with_metadata(parent_doc, child_text, child_index): return Document( page_content=child_text, metadata={ **parent_doc.metadata, # inherit source, date, author "parent_id": parent_doc.metadata["doc_id"], "child_index": child_index, } )When you retrieve a child and look up its parent, the parent document will also carry this metadata, making it easy to surface citations like “From document: Annual Report 2024, Section 3.”
2025 Trend: Contextual Chunk Headers
An emerging enhancement is adding a generated summary of the parent context as a header to each child chunk before embedding. This solves the out-of-context problem where child chunks are ambiguous without knowing what section they came from:
import anthropic
def add_context_header(parent_text: str, child_text: str) -> str: client = anthropic.Anthropic() summary = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=100, messages=[{ "role": "user", "content": f"Summarize the context of this passage in one sentence:\n\n{parent_text[:500]}" }] ).content[0].text return f"[Context: {summary}]\n\n{child_text}"This technique (popularized by Anthropic’s contextual retrieval research) can improve retrieval recall by 20–40% on ambiguous queries.
When Parent-Child Chunking Pays Off
The overhead is worth it when:
- Your queries require context that spans multiple small chunks
- You’re seeing “correct retrieval, bad generation” failures — the right chunk is found but the answer is incomplete
- Your documents have clear hierarchical structure (articles with sections, contracts with clauses)
- You have the storage budget for both a vector store and a document store
Skip it when:
- Documents are already short enough to use as-is
- Retrieval precision is your bottleneck (not generation quality)
- You need to minimize operational complexity
Parent-child chunking is the architecture that makes RAG feel like it actually understands your documents rather than just pattern-matching against fragments.