Extended Thinking & Complex Reasoning
Using thinking blocks and budget tokens for complex tasks.
Learning Objectives
- Enable extended thinking for complex reasoning
- Configure budget tokens appropriately
- Know when extended thinking improves results
Extended Thinking and Complex Reasoning
Extended Thinking is an Anthropic-specific feature that gives Claude a dedicated, separate space to reason through complex problems before producing its visible response. Unlike chain-of-thought prompting (where reasoning appears in the output), Extended Thinking uses a special thinking block with its own token budget. This feature is particularly important for the CCA-F exam because it represents a distinct architectural decision with specific trade-offs.
How Extended Thinking Works
When you enable Extended Thinking, Claude's response includes two types of content blocks:
- Thinking blocks: Internal reasoning that Claude uses to work through the problem. These are visible to your application but are not shown to end users by default. Think of them as “scratch paper.”
- Text blocks: The final, polished response that incorporates the insights from the thinking process.
Enabling Extended Thinking
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Maximum tokens for thinking
},
messages=[{"role": "user", "content": (
"Analyze the following business scenario and recommend a strategy.\n\n"
"A mid-size SaaS company (ARR $50M) is seeing:\n"
"- Customer churn increasing from 5% to 8% quarterly\n"
"- Net revenue retention dropping to 95%\n"
"- Sales cycle lengthening from 30 to 45 days\n"
"- Support ticket volume up 40% YoY\n\n"
"Diagnose the likely root causes and recommend a prioritized "
"action plan with expected impact for each action."
)}]
)
# Process the response
for block in response.content:
if block.type == "thinking":
print("THINKING:", block.thinking[:200], "...")
elif block.type == "text":
print("RESPONSE:", block.text)Budget Tokens
The budget_tokens parameter controls how much reasoning space Claude has. This is a critical architectural decision:
- Minimum: 1,024 tokens. Below this, Extended Thinking cannot be enabled.
- Practical range: 5,000 to 30,000 tokens for most tasks.
- Maximum: Must be less than
max_tokens. The total response (thinking + visible output) cannot exceedmax_tokens.
Budget Sizing Guidelines
- Simple analysis (5,000-8,000): Code review, basic reasoning, straightforward classification with justification.
- Moderate complexity (10,000-20,000): Multi-step math, strategic analysis, architectural decisions, debugging complex issues.
- High complexity (20,000-50,000): Advanced mathematical proofs, multi-factor decision analysis, complex code generation with error handling.
budget_tokens must be at least 1,024, (2) budget_tokensmust be less than max_tokens, (3) Extended Thinking is NOT compatible with prefilling (you cannot include an assistant message at the end of the messages array), (4) temperature must be set to 1 when Extended Thinking is enabled (you cannot lower it), and (5) thinking blocks are not guaranteed to appear in every response.When to Use Extended Thinking
Good Candidates for Extended Thinking
- Multi-step mathematical reasoning: Problems that require carrying intermediate results across several steps.
- Complex code generation: Tasks where Claude needs to consider multiple approaches, edge cases, and interactions before committing to an implementation.
- Strategic analysis: Business decisions with multiple factors, trade-offs, and stakeholder perspectives.
- Debugging: Tracing through code logic to identify the root cause of a subtle bug.
- Architectural planning: Designing system architectures where components interact in complex ways.
Poor Candidates for Extended Thinking
- Simple extraction or classification: Tasks where the answer is directly in the input text. Extended Thinking adds latency and cost without improving accuracy.
- Text generation: Creative writing, summarization, and translation do not benefit from extended reasoning.
- Structured output formatting: If you just need Claude to convert data into JSON, thinking tokens are wasted.
- High-throughput, low-latency pipelines: Extended Thinking adds significant latency. If you are processing thousands of items and speed matters, skip it.
Extended Thinking in Multi-Turn Conversations
In multi-turn conversations, thinking blocks from previous turns are not sent back to the API. Only the visible text content from assistant messages is included in subsequent requests. This means Claude does not remember its earlier reasoning — only its conclusions.
import anthropic
client = anthropic.Anthropic()
# Turn 1: Initial analysis with thinking
response1 = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "Analyze this algorithm for time complexity.\n\ndef mystery(arr):\n n = len(arr)\n for i in range(n):\n for j in range(i, n):\n if arr[j] < arr[i]:\n arr[i], arr[j] = arr[j], arr[i]\n return arr"}]
)
# Extract only the text content for the next turn
assistant_text = ""
for block in response1.content:
if block.type == "text":
assistant_text = block.text
# Turn 2: Follow-up question
# Note: thinking blocks from turn 1 are NOT included
response2 = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[
{"role": "user", "content": "Analyze this algorithm for time complexity.\n\ndef mystery(arr):\n n = len(arr)\n for i in range(n):\n for j in range(i, n):\n if arr[j] < arr[i]:\n arr[i], arr[j] = arr[j], arr[i]\n return arr"},
{"role": "assistant", "content": assistant_text},
{"role": "user", "content": "Can you optimize this to O(n log n)?"}
]
)Extended Thinking vs. Chain-of-Thought
This distinction is critical for the exam:
- Chain-of-Thought (CoT): A prompting technique where you ask Claude to show its reasoning in the visible output. The reasoning is part of the text response. No special API parameters needed. Works with all models and configurations. Reasoning and answer share the same
max_tokensbudget. - Extended Thinking: An API feature with a dedicated thinking budget. Reasoning happens in a separate
thinkingblock, not in the visible output. Requires specific API parameters. Has constraints (no prefilling, temperature = 1). Thinking gets its own token budget separate from the response.
Decision Matrix
- Use CoT when: you want visible reasoning, need prefilling, need temperature control, or are processing high-volume tasks where latency matters.
- Use Extended Thinking when: the task genuinely requires deep reasoning, you want a clean response without visible working, and you can accept the latency and cost.
- Use neither when: the task is straightforward extraction, classification, or generation that does not benefit from step-by-step reasoning.
Streaming with Extended Thinking
Extended Thinking works with streaming. Thinking tokens stream first, followed by the text response. You can display a “thinking” indicator while thinking tokens arrive and then switch to displaying the response.
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "Solve this step by step: If a train leaves Chicago at 9am traveling 60mph and another leaves New York at 10am traveling 80mph, when do they meet? The distance is 790 miles."}]
) as stream:
current_block_type = None
for event in stream:
if hasattr(event, "type"):
if event.type == "content_block_start":
if event.content_block.type == "thinking":
current_block_type = "thinking"
print("[Thinking...]")
elif event.content_block.type == "text":
current_block_type = "text"
print("\n[Response]")
elif event.type == "content_block_delta":
if current_block_type == "text" and hasattr(event.delta, "text"):
print(event.delta.text, end="", flush=True)