Long-Document Processing
Chunking, map-reduce, and hierarchical summarization patterns.
Learning Objectives
- Design chunking strategies for large documents
- Implement map-reduce processing patterns
- Build hierarchical summarization pipelines
Long-Document Processing
Many real-world applications require Claude to process documents that are far too large to fit in a single context window, or that would consume so much of the window that meaningful analysis becomes impossible. Long-document processing techniques — chunking, map-reduce, and hierarchical summarization — allow you to break large documents into manageable pieces, process each piece independently, and then combine the results.
When Do You Need Chunking?
Even though Claude's 200K token context window can hold roughly 150,000 words (about 500 pages of text), there are several reasons to chunk documents rather than sending them whole:
- The document exceeds the context window. Legal contracts, codebases, book manuscripts, and research paper collections can easily exceed 200K tokens.
- You need room for instructions and output. Sending a 180K-token document leaves only 20K tokens for the system prompt, tools, and model output.
- Accuracy degrades with very long inputs. Research shows that model attention is strongest at the beginning and end of the context window — the "lost in the middle" effect. Chunking helps ensure every part of the document receives adequate attention.
- You need parallel processing. Processing 10 chunks simultaneously is 10x faster than processing one massive document sequentially.
Chunking Strategies
How you split a document matters enormously. Naive splitting (every N characters) can break sentences, paragraphs, and semantic units. Here are the main strategies:
Fixed-Size Chunking with Overlap
def chunk_by_tokens(text, chunk_size=4000, overlap=200):
"""
Split text into chunks of approximately chunk_size tokens
with overlap to preserve context across boundaries.
Args:
text: The full document text
chunk_size: Target tokens per chunk
overlap: Number of tokens to overlap between chunks
Returns:
List of text chunks
"""
# Approximate: 1 token ~ 4 characters for English text
char_chunk = chunk_size * 4
char_overlap = overlap * 4
chunks = []
start = 0
while start < len(text):
end = start + char_chunk
chunk = text[start:end]
# Try to end at a sentence boundary
if end < len(text):
last_period = chunk.rfind(".")
last_newline = chunk.rfind("\n")
boundary = max(last_period, last_newline)
if boundary > len(chunk) * 0.8: # Only if boundary is near the end
chunk = chunk[:boundary + 1]
end = start + boundary + 1
chunks.append(chunk.strip())
start = end - char_overlap # Overlap with previous chunk
return chunksSemantic Chunking by Document Structure
import re
def chunk_by_sections(text, max_chunk_tokens=8000):
"""
Split a document by its natural section boundaries
(headings, chapters, etc.) while respecting a max size.
"""
# Split on markdown-style headings or numbered sections
section_pattern = r"\n(?=#{1,3} |\d+\.\s|Chapter \d|SECTION \d)"
sections = re.split(section_pattern, text)
chunks = []
current_chunk = ""
char_limit = max_chunk_tokens * 4
for section in sections:
if len(current_chunk) + len(section) <= char_limit:
current_chunk += section
else:
if current_chunk.strip():
chunks.append(current_chunk.strip())
# If a single section exceeds the limit, split it further
if len(section) > char_limit:
sub_chunks = chunk_by_tokens(
section, chunk_size=max_chunk_tokens, overlap=200
)
chunks.extend(sub_chunks)
current_chunk = ""
else:
current_chunk = section
if current_chunk.strip():
chunks.append(current_chunk.strip())
return chunksThe Map-Reduce Pattern
Map-reduce is the most important pattern for processing large documents. It has two phases: the map phase processes each chunk independently, and thereduce phase combines the results into a final output.
import anthropic
from concurrent.futures import ThreadPoolExecutor, as_completed
client = anthropic.Anthropic()
def map_reduce_summarize(document, chunk_size=6000):
"""
Summarize a large document using map-reduce.
Map phase: Summarize each chunk independently (parallelizable)
Reduce phase: Combine chunk summaries into a final summary
"""
# Step 1: Chunk the document
chunks = chunk_by_tokens(document, chunk_size=chunk_size, overlap=200)
print(f"Document split into {len(chunks)} chunks")
# Step 2: Map — Summarize each chunk in parallel
def summarize_chunk(chunk_index, chunk_text):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": (
f"You are summarizing part {chunk_index + 1} of "
f"{len(chunks)} of a larger document.\n\n"
f"Summarize the following section, preserving key facts, "
f"names, dates, and conclusions:\n\n{chunk_text}"
),
}],
)
return chunk_index, response.content[0].text
chunk_summaries = [None] * len(chunks)
with ThreadPoolExecutor(max_workers=5) as executor:
futures = {
executor.submit(summarize_chunk, i, chunk): i
for i, chunk in enumerate(chunks)
}
for future in as_completed(futures):
idx, summary = future.result()
chunk_summaries[idx] = summary
print(f" Chunk {idx + 1}/{len(chunks)} summarized")
# Step 3: Reduce — Combine all chunk summaries
combined_summaries = "\n\n".join(
f"--- Section {i+1} ---\n{s}"
for i, s in enumerate(chunk_summaries)
)
final_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": (
"Below are summaries of consecutive sections of a document. "
"Synthesize them into a single coherent summary that:\n"
"1. Captures the main themes and conclusions\n"
"2. Preserves important details and data points\n"
"3. Maintains logical flow\n"
"4. Eliminates redundancy from overlapping sections\n\n"
f"{combined_summaries}"
),
}],
)
return final_response.content[0].textMap-Reduce for Question Answering
Map-reduce is not limited to summarization. You can use it to answer specific questions about large documents, extract structured data, or perform analysis.
def map_reduce_qa(document, question, chunk_size=6000):
"""
Answer a question about a large document using map-reduce.
Map: Extract relevant information from each chunk
Reduce: Synthesize extractions into a final answer
"""
chunks = chunk_by_tokens(document, chunk_size=chunk_size)
# Map phase: Extract relevant info from each chunk
def extract_from_chunk(chunk_index, chunk_text):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": (
f"Given the following question:\n{question}\n\n"
f"Extract any information from this text that is relevant "
f"to answering the question. If nothing is relevant, "
f"respond with 'NO_RELEVANT_INFO'.\n\n"
f"Text (section {chunk_index + 1} of {len(chunks)}):\n"
f"{chunk_text}"
),
}],
)
return chunk_index, response.content[0].text
extractions = []
with ThreadPoolExecutor(max_workers=5) as executor:
futures = {
executor.submit(extract_from_chunk, i, c): i
for i, c in enumerate(chunks)
}
for future in as_completed(futures):
idx, extraction = future.result()
if "NO_RELEVANT_INFO" not in extraction:
extractions.append((idx, extraction))
# Sort by chunk index to maintain document order
extractions.sort(key=lambda x: x[0])
if not extractions:
return "No relevant information found in the document."
# Reduce phase: Synthesize into a final answer
context = "\n\n".join(
f"[From section {idx + 1}]: {text}"
for idx, text in extractions
)
final_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": (
f"Question: {question}\n\n"
f"Based on the following extractions from a large document, "
f"provide a comprehensive answer:\n\n{context}"
),
}],
)
return final_response.content[0].textHierarchical Summarization
For very large documents (books, legal filings, research paper collections), a single reduce step may not be sufficient — the combined chunk summaries might themselves exceed the context window. Hierarchical summarization solves this by applying multiple levels of reduction.
def hierarchical_summarize(document, chunk_size=6000, summary_chunk_size=10000):
"""
Multi-level summarization for very large documents.
Level 1: Summarize individual chunks
Level 2: Group chunk summaries and summarize groups
Level 3: Combine group summaries into final summary
Repeat until everything fits in one context window.
"""
# Level 1: Chunk and summarize
chunks = chunk_by_tokens(document, chunk_size=chunk_size)
print(f"Level 1: Processing {len(chunks)} chunks")
level_summaries = []
for i, chunk in enumerate(chunks):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=512,
messages=[{
"role": "user",
"content": (
f"Summarize this section concisely, preserving key "
f"information:\n\n{chunk}"
),
}],
)
level_summaries.append(response.content[0].text)
# Continue reducing until summaries fit in one context window
level = 2
while True:
combined = "\n\n".join(level_summaries)
combined_tokens = len(combined) // 4 # Approximate
if combined_tokens < summary_chunk_size:
# Everything fits — do the final synthesis
print(f"Final synthesis from {len(level_summaries)} summaries")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": (
"Synthesize these section summaries into a coherent "
f"final summary:\n\n{combined}"
),
}],
)
return response.content[0].text
# Need another level of reduction
print(f"Level {level}: Reducing {len(level_summaries)} summaries")
summary_chunks = chunk_by_tokens(
combined, chunk_size=summary_chunk_size
)
new_summaries = []
for chunk in summary_chunks:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": (
"Consolidate these summaries into a single shorter "
f"summary:\n\n{chunk}"
),
}],
)
new_summaries.append(response.content[0].text)
level_summaries = new_summaries
level += 1Choosing the Right Chunk Size
Chunk size is a critical parameter that requires balancing multiple concerns:
- Too small (under 1,000 tokens): Chunks lack sufficient context for meaningful analysis. The model cannot understand relationships between ideas that span multiple chunks.
- Too large (over 20,000 tokens): Fewer chunks mean less parallelism, and each chunk takes longer to process. Also, the "lost in the middle" effect becomes more pronounced.
- Sweet spot (2,000-8,000 tokens): Large enough for coherent analysis, small enough for efficient parallel processing.
The overlap between chunks (typically 5-10% of chunk size) is also important. Too little overlap and you lose information at boundaries. Too much overlap and you waste tokens re-processing the same content.
Exam Tip: The exam will test your knowledge of when to use map-reduce versus simply sending the entire document. Key decision factors: (1) Does the document exceed the context window? If yes, chunking is mandatory. (2) Does the document consume more than 75% of the context window? If yes, chunking is recommended to leave room for instructions and output. (3) Do you need to process the document quickly? Parallel map-reduce can be significantly faster than sequential processing.
Exam Tip: A common exam question asks about the "lost in the middle" problem. When Claude processes very long inputs, information in the middle of the context receives less attention than information at the beginning or end. Chunking mitigates this because each chunk is processed independently, and every piece of the document appears at the "beginning" of some chunk.
Key Takeaways
Chunk at semantic boundaries (sections, paragraphs, sentences) rather than at arbitrary character positions. Use overlap to prevent information loss at chunk boundaries.
Map-reduce is the workhorse pattern for large documents. The map phase processes chunks in parallel, and the reduce phase synthesizes results. It works for summarization, Q&A, extraction, and analysis.
Hierarchical summarization handles documents of arbitrary length by applying multiple levels of reduction until the combined output fits in a single context window.
Optimal chunk size is 2,000-8,000 tokens for most use cases, with 5-10% overlap between adjacent chunks to preserve cross-boundary context.