Fixed Size Chunking: The Bedrock Strategy That Still Delivers

When people first start building RAG pipelines, fixed-size chunking is usually the first strategy they reach for. And honestly? There’s nothing wrong with that. It’s predictable, fast, and for a surprising number of real-world use cases, it performs just as well as anything fancier.

The idea is simple: divide your documents into chunks of a predetermined size — measured either in characters or tokens — with an optional overlap between adjacent chunks to prevent context from getting cut off at the seams.

How It Works

At its core, fixed-size chunking is a sliding window over your text:

Document text (2,000 tokens)
┌─────────────────────────────────────────────┐
│                  Full Document               │
└─────────────────────────────────────────────┘

After chunking (chunk_size=500, overlap=50):
┌──────────┐
│  Chunk 1 │  tokens 0–500
  ┌──────────┐
  │  Chunk 2 │  tokens 450–950
    ┌──────────┐
    │  Chunk 3 │  tokens 900–1400
      ┌──────────┐
      │  Chunk 4 │  tokens 1350–1850

The overlap region means the same text appears in two consecutive chunks. This redundancy prevents a query from missing relevant content that happens to straddle a chunk boundary.

Token vs Character Chunking

This distinction matters more than most guides let on.

Character-based chunking splits at a fixed character count (e.g., 2,000 characters). It’s fast because it requires no tokenization step. The problem is that character counts don’t map cleanly to what your embedding model actually sees — a 2,000-character chunk might be 350 tokens for simple English prose but 600 tokens for dense technical content with long words.

Token-based chunking splits at a fixed token count aligned to your embedding model’s tokenizer. This is more accurate but requires running tokenization twice — once to count, once in the embedding model. For OpenAI’s text-embedding-3-small with a 8,192-token limit, you might target 512 tokens per chunk to leave headroom for the query concatenation.

from langchain.text_splitter import TokenTextSplitter

splitter = TokenTextSplitter(
    encoding_name="cl100k_base",  # same as text-embedding-3-small
    chunk_size=512,
    chunk_overlap=64,
)
chunks = splitter.split_text(document_text)

Choosing Chunk Size and Overlap

There’s no universal answer here, but some principles hold across most use cases:

Chunk size guidance:

256–512 tokens: Works well when queries are short and factual (“What’s the refund policy?”)
512–1024 tokens: Better for nuanced questions that need surrounding context
1024+ tokens: Useful when your generator needs long passages for synthesis tasks

Overlap guidance:

0% overlap: Minimal storage, but edge cases get split badly
10–15% overlap: Sweet spot for most pipelines — negligible cost, real benefit
25%+ overlap: Only worth it if you see retrievals consistently missing boundary content

A quick empirical test: create 50–100 question-answer pairs from your documents, then measure retrieval recall at each chunk size. You’ll find a clear winner for your specific content type.

Where Fixed-Size Chunking Shines

It’s tempting to jump straight to semantic chunking, but fixed-size has legitimate advantages:

Predictable embedding costs: Every chunk takes roughly the same compute. Budget forecasting is trivial.
Uniform vector storage: Consistent document count per original page makes index management easier.
Great for uniform content: FAQs, product catalogs, news articles — content without complex hierarchical structure benefits little from semantic splitting.
Fast iteration: When you’re prototyping, you want to get chunks into a vector store in minutes, not hours.

Pitfalls to Watch For

Mid-sentence splits are the most common complaint. If chunk N ends mid-sentence, the embedding model will produce a weaker representation because the semantic unit is incomplete. Using sentence-boundary-aware splitting (still fixed-size but only splitting at sentence boundaries within a token range) mitigates this.

Tables and code blocks break badly with fixed-size chunking. A code function split across two chunks loses its meaning entirely. Consider detecting these structures and treating them as atomic units — don’t split inside them.

Identical overlap regions inflate your vector index. If you have 20% overlap and 1 million chunks, you’re embedding and storing roughly 200,000 token-equivalents of pure duplicates. This adds cost and can create retrieval noise where the top-K results return the same content from adjacent overlapping chunks.

Deduplication After Chunking

One practical improvement: after generating chunks with overlap, run a post-processing step to detect near-duplicate chunks and remove them. FAISS’s range search or simple MinHash can identify chunks with >80% token overlap.

2025 Trend: Adaptive Fixed-Size

The latest production systems don’t use a single fixed size globally. Instead, they set size by content type:

CHUNK_CONFIGS = {
    "narrative": {"size": 512, "overlap": 64},
    "technical_doc": {"size": 768, "overlap": 100},
    "faq": {"size": 256, "overlap": 0},
    "code": {"size": 400, "overlap": 0},  # never split mid-function
}

def chunk_document(text, doc_type):
    config = CHUNK_CONFIGS.get(doc_type, {"size": 512, "overlap": 64})
    return TokenTextSplitter(**config).split_text(text)

This hybrid approach keeps the simplicity of fixed-size while acknowledging that a single setting never generalizes perfectly.

Implementation Checklist

Decide on token vs character splitting based on your embedding model
Set overlap to 10–15% of chunk size as starting point
Handle code blocks and tables as atomic chunks
Benchmark recall on a gold set of 50+ Q&A pairs
Monitor for duplicate top-K results caused by overlapping chunks
Consider per-content-type size configs once baseline is working

Fixed-size chunking is the right starting point for almost every RAG project. Get this working first, measure, then decide if more complex strategies are worth the added maintenance overhead.