Fixed Size Chunking: The Bedrock Strategy That Still Delivers
When people first start building RAG pipelines, fixed-size chunking is usually the first strategy they reach for. And honestly? There’s nothing wrong with that. It’s predictable, fast, and for a surprising number of real-world use cases, it performs just as well as anything fancier.
The idea is simple: divide your documents into chunks of a predetermined size — measured either in characters or tokens — with an optional overlap between adjacent chunks to prevent context from getting cut off at the seams.
How It Works
At its core, fixed-size chunking is a sliding window over your text:
Document text (2,000 tokens)┌─────────────────────────────────────────────┐│ Full Document │└─────────────────────────────────────────────┘
After chunking (chunk_size=500, overlap=50):┌──────────┐│ Chunk 1 │ tokens 0–500 ┌──────────┐ │ Chunk 2 │ tokens 450–950 ┌──────────┐ │ Chunk 3 │ tokens 900–1400 ┌──────────┐ │ Chunk 4 │ tokens 1350–1850The overlap region means the same text appears in two consecutive chunks. This redundancy prevents a query from missing relevant content that happens to straddle a chunk boundary.
Token vs Character Chunking
This distinction matters more than most guides let on.
Character-based chunking splits at a fixed character count (e.g., 2,000 characters). It’s fast because it requires no tokenization step. The problem is that character counts don’t map cleanly to what your embedding model actually sees — a 2,000-character chunk might be 350 tokens for simple English prose but 600 tokens for dense technical content with long words.
Token-based chunking splits at a fixed token count aligned to your embedding model’s tokenizer. This is more accurate but requires running tokenization twice — once to count, once in the embedding model. For OpenAI’s text-embedding-3-small with a 8,192-token limit, you might target 512 tokens per chunk to leave headroom for the query concatenation.
from langchain.text_splitter import TokenTextSplitter
splitter = TokenTextSplitter( encoding_name="cl100k_base", # same as text-embedding-3-small chunk_size=512, chunk_overlap=64,)chunks = splitter.split_text(document_text)Choosing Chunk Size and Overlap
There’s no universal answer here, but some principles hold across most use cases:
Chunk size guidance:
- 256–512 tokens: Works well when queries are short and factual (“What’s the refund policy?”)
- 512–1024 tokens: Better for nuanced questions that need surrounding context
- 1024+ tokens: Useful when your generator needs long passages for synthesis tasks
Overlap guidance:
- 0% overlap: Minimal storage, but edge cases get split badly
- 10–15% overlap: Sweet spot for most pipelines — negligible cost, real benefit
- 25%+ overlap: Only worth it if you see retrievals consistently missing boundary content
A quick empirical test: create 50–100 question-answer pairs from your documents, then measure retrieval recall at each chunk size. You’ll find a clear winner for your specific content type.
Where Fixed-Size Chunking Shines
It’s tempting to jump straight to semantic chunking, but fixed-size has legitimate advantages:
- Predictable embedding costs: Every chunk takes roughly the same compute. Budget forecasting is trivial.
- Uniform vector storage: Consistent document count per original page makes index management easier.
- Great for uniform content: FAQs, product catalogs, news articles — content without complex hierarchical structure benefits little from semantic splitting.
- Fast iteration: When you’re prototyping, you want to get chunks into a vector store in minutes, not hours.
Pitfalls to Watch For
Mid-sentence splits are the most common complaint. If chunk N ends mid-sentence, the embedding model will produce a weaker representation because the semantic unit is incomplete. Using sentence-boundary-aware splitting (still fixed-size but only splitting at sentence boundaries within a token range) mitigates this.
Tables and code blocks break badly with fixed-size chunking. A code function split across two chunks loses its meaning entirely. Consider detecting these structures and treating them as atomic units — don’t split inside them.
Identical overlap regions inflate your vector index. If you have 20% overlap and 1 million chunks, you’re embedding and storing roughly 200,000 token-equivalents of pure duplicates. This adds cost and can create retrieval noise where the top-K results return the same content from adjacent overlapping chunks.
Deduplication After Chunking
One practical improvement: after generating chunks with overlap, run a post-processing step to detect near-duplicate chunks and remove them. FAISS’s range search or simple MinHash can identify chunks with >80% token overlap.
2025 Trend: Adaptive Fixed-Size
The latest production systems don’t use a single fixed size globally. Instead, they set size by content type:
CHUNK_CONFIGS = { "narrative": {"size": 512, "overlap": 64}, "technical_doc": {"size": 768, "overlap": 100}, "faq": {"size": 256, "overlap": 0}, "code": {"size": 400, "overlap": 0}, # never split mid-function}
def chunk_document(text, doc_type): config = CHUNK_CONFIGS.get(doc_type, {"size": 512, "overlap": 64}) return TokenTextSplitter(**config).split_text(text)This hybrid approach keeps the simplicity of fixed-size while acknowledging that a single setting never generalizes perfectly.
Implementation Checklist
- Decide on token vs character splitting based on your embedding model
- Set overlap to 10–15% of chunk size as starting point
- Handle code blocks and tables as atomic chunks
- Benchmark recall on a gold set of 50+ Q&A pairs
- Monitor for duplicate top-K results caused by overlapping chunks
- Consider per-content-type size configs once baseline is working
Fixed-size chunking is the right starting point for almost every RAG project. Get this working first, measure, then decide if more complex strategies are worth the added maintenance overhead.