Recursive Chunking: The Smart Way to Split Documents
There’s a moment in every RAG project where fixed-size chunking starts feeling like a blunt instrument. You’re splitting paragraphs mid-idea, cutting code functions in half, and getting retrieval results that feel technically correct but contextually hollow. That’s when recursive chunking starts making sense.
The core insight of recursive chunking is that documents have natural structure — paragraphs, sentences, words — and you should try to split along those natural boundaries before forcing arbitrary cuts.
The Recursive Splitting Algorithm
Recursive chunking works through a priority-ordered list of separators. It tries the first (largest-granularity) separator. If the resulting chunks are still too large, it tries the next separator on those pieces. It keeps recursing until chunks are small enough.
Separator cascade (highest to lowest granularity): 1. "\n\n" → paragraph breaks 2. "\n" → line breaks 3. ". " → sentence endings 4. ", " → clause breaks 5. " " → word breaks 6. "" → character-level (last resort)
Example document: "Introduction to RAG\n\nRAG combines retrieval with generation. It works by embedding documents.\n\nFirst, index your corpus..."
Step 1: Split on "\n\n" → 2 chunks Chunk A: "Introduction to RAG\n\nRAG combines retrieval with generation. It works by embedding documents." Chunk B: "First, index your corpus..."
Step 2: Chunk A fits within limit → keep itStep 2: Chunk B fits within limit → keep itThe result preserves paragraph integrity when possible. You only get finer-grained splits when content density demands it.
Why This Matters for Retrieval Quality
Consider what happens when you embed a chunk that was split mid-paragraph in a fixed-size scheme:
BAD (fixed-size split at 500 chars):Chunk 1: "The transformer architecture relies on attention mechanisms that allow tokens to attend to every other token in the sequence. This enables long-range dependencies. The feed-forward layers then process each position"
Chunk 2: "independently, applying the same transformation to each token. This design choice, combined with positional encodings, is why..."The embedding for Chunk 2 is semantically incomplete — it starts mid-thought. A query about “transformer feed-forward layers” might not retrieve either chunk confidently.
Recursive chunking would have kept the full explanation in one chunk, producing a stronger, more query-matchable embedding.
Implementing with LangChain
LangChain’s RecursiveCharacterTextSplitter is the most widely deployed implementation:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter( chunk_size=1000, # target size in characters chunk_overlap=200, # overlap between chunks length_function=len, # or use token counter separators=["\n\n", "\n", ". ", ", ", " ", ""], is_separator_regex=False,)
docs = splitter.create_documents([document_text])For token-aware splitting (preferred for production):
import tiktokenfrom langchain.text_splitter import RecursiveCharacterTextSplitter
enc = tiktoken.get_encoding("cl100k_base")
def token_len(text): return len(enc.encode(text))
splitter = RecursiveCharacterTextSplitter( chunk_size=512, # in tokens chunk_overlap=64, length_function=token_len, separators=["\n\n", "\n", ". ", " ", ""],)Language-Specific Separator Configurations
Different languages and content types need different separator hierarchies. LangChain exposes language-specific presets:
from langchain.text_splitter import Language
# For Python codepython_splitter = RecursiveCharacterTextSplitter.from_language( language=Language.PYTHON, chunk_size=500, chunk_overlap=50,)
# For Markdown documentsmd_splitter = RecursiveCharacterTextSplitter.from_language( language=Language.MARKDOWN, chunk_size=800, chunk_overlap=80,)The Python preset uses class and function definitions as primary split points, which keeps logical code units together. The Markdown preset respects ## and ### headers as natural boundaries.
The Overlap Question in Recursive Chunking
Overlap behaves slightly differently here than in fixed-size chunking. Because recursive splits preserve sentence boundaries when possible, overlap carries over actual complete sentences rather than arbitrary character spans. This makes the overlap semantically richer.
Practical guidance: set overlap to about one sentence length (roughly 30–80 tokens for most prose). This ensures the “bridge sentence” that ties two thoughts together appears in both the tail of the previous chunk and the head of the next.
Performance Characteristics
Benchmark: 10,000 document corpus, avg 2,000 tokens/doc
Strategy | Chunks | Avg Quality Score | Split Time----------------------|----------|-------------------|------------Fixed-size (512 tok) | 40,200 | 0.71 | 12sRecursive (512 tok) | 38,600 | 0.78 | 18sSemantic | 51,300 | 0.83 | 4m 22s
Quality Score = retrieval recall @ k=5 on 200 test queriesRecursive chunking sits in a comfortable middle ground — meaningfully better than fixed-size with only 50% more preprocessing time. Semantic chunking wins on quality but at a cost that may not be justified for all use cases.
2025 Trend: Custom Separator Hierarchies per Document Type
The direction in production systems is to maintain a registry of separator hierarchies per document type, dynamically selected at ingestion time:
SEPARATOR_REGISTRY = { "legal_contract": ["\nARTICLE", "\nSection", "\n\n", "\n", ". "], "medical_report": ["\n## ", "\n### ", "\n\n", "\n", ". "], "api_docs": ["\n---\n", "\n## ", "\n\n", "\n", ". ", " "], "default": ["\n\n", "\n", ". ", " ", ""],}
def get_splitter(doc_type: str, chunk_size: int = 512): separators = SEPARATOR_REGISTRY.get(doc_type, SEPARATOR_REGISTRY["default"]) return RecursiveCharacterTextSplitter( chunk_size=chunk_size, chunk_overlap=int(chunk_size * 0.12), separators=separators, )When to Choose Recursive Over Fixed-Size
Use recursive chunking when:
- Documents have meaningful paragraph or section structure
- Content type varies within your corpus
- You’re seeing retrieval misses that trace back to mid-paragraph splits
- You have mixed content (prose + code + lists) in the same documents
Stick with fixed-size when:
- You need maximum throughput during ingestion
- Documents are short and uniform (tweets, product titles, FAQs)
- You’re in early prototype phase and want minimal moving parts
Recursive chunking is the pragmatic upgrade from fixed-size — most teams make this switch within the first month of a production RAG deployment.