Fixed Size Chunking for RAG: When Simple Works Best

Learn fixed-size chunking for RAG pipelines — chunk sizing, overlap strategies, token vs character splits, and when to use this approach.

Fixed Size Chunking: The Bedrock Strategy That Still Delivers

When people first start building RAG pipelines, fixed-size chunking is usually the first strategy they reach for. And honestly? There’s nothing wrong with that. It’s predictable, fast, and for a surprising number of real-world use cases, it performs just as well as anything fancier.

The idea is simple: divide your documents into chunks of a predetermined size — measured either in characters or tokens — with an optional overlap between adjacent chunks to prevent context from getting cut off at the seams.

How It Works

At its core, fixed-size chunking is a sliding window over your text:

Document text (2,000 tokens)
┌─────────────────────────────────────────────┐
│ Full Document │
└─────────────────────────────────────────────┘
After chunking (chunk_size=500, overlap=50):
┌──────────┐
│ Chunk 1 │ tokens 0–500
┌──────────┐
│ Chunk 2 │ tokens 450–950
┌──────────┐
│ Chunk 3 │ tokens 900–1400
┌──────────┐
│ Chunk 4 │ tokens 1350–1850

The overlap region means the same text appears in two consecutive chunks. This redundancy prevents a query from missing relevant content that happens to straddle a chunk boundary.

Token vs Character Chunking

This distinction matters more than most guides let on.

Character-based chunking splits at a fixed character count (e.g., 2,000 characters). It’s fast because it requires no tokenization step. The problem is that character counts don’t map cleanly to what your embedding model actually sees — a 2,000-character chunk might be 350 tokens for simple English prose but 600 tokens for dense technical content with long words.

Token-based chunking splits at a fixed token count aligned to your embedding model’s tokenizer. This is more accurate but requires running tokenization twice — once to count, once in the embedding model. For OpenAI’s text-embedding-3-small with a 8,192-token limit, you might target 512 tokens per chunk to leave headroom for the query concatenation.

from langchain.text_splitter import TokenTextSplitter
splitter = TokenTextSplitter(
encoding_name="cl100k_base", # same as text-embedding-3-small
chunk_size=512,
chunk_overlap=64,
)
chunks = splitter.split_text(document_text)

Choosing Chunk Size and Overlap

There’s no universal answer here, but some principles hold across most use cases:

Chunk size guidance:

  • 256–512 tokens: Works well when queries are short and factual (“What’s the refund policy?”)
  • 512–1024 tokens: Better for nuanced questions that need surrounding context
  • 1024+ tokens: Useful when your generator needs long passages for synthesis tasks

Overlap guidance:

  • 0% overlap: Minimal storage, but edge cases get split badly
  • 10–15% overlap: Sweet spot for most pipelines — negligible cost, real benefit
  • 25%+ overlap: Only worth it if you see retrievals consistently missing boundary content

A quick empirical test: create 50–100 question-answer pairs from your documents, then measure retrieval recall at each chunk size. You’ll find a clear winner for your specific content type.

Where Fixed-Size Chunking Shines

It’s tempting to jump straight to semantic chunking, but fixed-size has legitimate advantages:

  • Predictable embedding costs: Every chunk takes roughly the same compute. Budget forecasting is trivial.
  • Uniform vector storage: Consistent document count per original page makes index management easier.
  • Great for uniform content: FAQs, product catalogs, news articles — content without complex hierarchical structure benefits little from semantic splitting.
  • Fast iteration: When you’re prototyping, you want to get chunks into a vector store in minutes, not hours.

Pitfalls to Watch For

Mid-sentence splits are the most common complaint. If chunk N ends mid-sentence, the embedding model will produce a weaker representation because the semantic unit is incomplete. Using sentence-boundary-aware splitting (still fixed-size but only splitting at sentence boundaries within a token range) mitigates this.

Tables and code blocks break badly with fixed-size chunking. A code function split across two chunks loses its meaning entirely. Consider detecting these structures and treating them as atomic units — don’t split inside them.

Identical overlap regions inflate your vector index. If you have 20% overlap and 1 million chunks, you’re embedding and storing roughly 200,000 token-equivalents of pure duplicates. This adds cost and can create retrieval noise where the top-K results return the same content from adjacent overlapping chunks.

Deduplication After Chunking

One practical improvement: after generating chunks with overlap, run a post-processing step to detect near-duplicate chunks and remove them. FAISS’s range search or simple MinHash can identify chunks with >80% token overlap.

2025 Trend: Adaptive Fixed-Size

The latest production systems don’t use a single fixed size globally. Instead, they set size by content type:

CHUNK_CONFIGS = {
"narrative": {"size": 512, "overlap": 64},
"technical_doc": {"size": 768, "overlap": 100},
"faq": {"size": 256, "overlap": 0},
"code": {"size": 400, "overlap": 0}, # never split mid-function
}
def chunk_document(text, doc_type):
config = CHUNK_CONFIGS.get(doc_type, {"size": 512, "overlap": 64})
return TokenTextSplitter(**config).split_text(text)

This hybrid approach keeps the simplicity of fixed-size while acknowledging that a single setting never generalizes perfectly.

Implementation Checklist

  • Decide on token vs character splitting based on your embedding model
  • Set overlap to 10–15% of chunk size as starting point
  • Handle code blocks and tables as atomic chunks
  • Benchmark recall on a gold set of 50+ Q&A pairs
  • Monitor for duplicate top-K results caused by overlapping chunks
  • Consider per-content-type size configs once baseline is working

Fixed-size chunking is the right starting point for almost every RAG project. Get this working first, measure, then decide if more complex strategies are worth the added maintenance overhead.