Error Recovery & Graceful Degradation
Handling API errors, timeouts, rate limits, and unexpected outputs.
Learning Objectives
- Design error recovery strategies
- Implement retry logic with backoff
- Build graceful degradation mechanisms
Error Recovery and Graceful Degradation
Production systems built on the Anthropic API must handle a variety of failure modes gracefully. API calls can fail due to rate limits, server errors, network timeouts, context length violations, and content policy blocks. A well-designed system anticipates these failures, retries intelligently, and degrades gracefully rather than crashing. This lesson covers the practical implementation of retry logic, error classification, fallback strategies, and circuit breaker patterns.
Understanding API Error Types
The Anthropic API returns specific error types that require different handling strategies:
- 400 Bad Request (
invalid_request_error): The request is malformed — bad parameters, invalid model name, or context too long. Do NOT retry; fix the request. - 401 Unauthorized (
authentication_error): Invalid API key. Do NOT retry; fix credentials. - 403 Forbidden (
permission_error): The API key does not have permission for this operation. Do NOT retry. - 429 Too Many Requests (
rate_limit_error): You have exceeded your rate limit. Retry after theRetry-Afterheader duration. - 500 Internal Server Error (
api_error): An unexpected server-side error. Retry with exponential backoff. - 529 Overloaded (
overloaded_error): The API is under heavy load. Retry with exponential backoff and longer initial delay.
Retry Logic with Exponential Backoff
The fundamental retry pattern uses exponential backoff with jitter to avoid thundering herd problems when many clients retry simultaneously.
import anthropic
import time
import random
client = anthropic.Anthropic()
def call_with_retry(
messages,
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=None,
max_retries=5,
initial_delay=1.0,
max_delay=60.0,
backoff_factor=2.0,
):
"""
Make an API call with intelligent retry logic.
Uses exponential backoff with jitter for retryable errors.
Immediately raises non-retryable errors.
"""
retryable_status_codes = {429, 500, 529}
params = {
"model": model,
"max_tokens": max_tokens,
"messages": messages,
}
if system:
params["system"] = system
last_exception = None
delay = initial_delay
for attempt in range(max_retries + 1):
try:
response = client.messages.create(**params)
return response
except anthropic.APIStatusError as e:
last_exception = e
# Non-retryable errors: raise immediately
if e.status_code not in retryable_status_codes:
print(f"Non-retryable error (HTTP {e.status_code}): {e.message}")
raise
# Rate limit: use Retry-After header if available
if e.status_code == 429:
retry_after = None
if hasattr(e, "response") and e.response is not None:
retry_after = e.response.headers.get("retry-after")
if retry_after:
delay = float(retry_after)
print(f"Rate limited. Waiting {delay}s (from Retry-After)")
else:
print(f"Rate limited. Waiting {delay}s (exponential backoff)")
else:
print(
f"Server error (HTTP {e.status_code}), "
f"attempt {attempt + 1}/{max_retries + 1}. "
f"Waiting {delay:.1f}s"
)
if attempt < max_retries:
# Add jitter: randomize between 50% and 100% of the delay
jittered_delay = delay * (0.5 + random.random() * 0.5)
time.sleep(jittered_delay)
# Increase delay for next attempt
delay = min(delay * backoff_factor, max_delay)
except anthropic.APIConnectionError as e:
last_exception = e
print(
f"Connection error, attempt {attempt + 1}/{max_retries + 1}: {e}"
)
if attempt < max_retries:
jittered_delay = delay * (0.5 + random.random() * 0.5)
time.sleep(jittered_delay)
delay = min(delay * backoff_factor, max_delay)
# All retries exhausted
raise last_exceptionUsing the SDK's Built-in Retry
The Anthropic Python SDK includes built-in retry logic that handles the common cases. For many applications, configuring the SDK's retry behavior is sufficient.
import anthropic
# Configure retry behavior at the client level
client = anthropic.Anthropic(
max_retries=3, # Number of retries (default: 2)
timeout=60.0, # Request timeout in seconds (default: 600)
)
# The SDK automatically retries on:
# - 429 Rate Limit (with Retry-After header)
# - 500 Internal Server Error
# - 529 Overloaded
# - Connection errors
try:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude!"}],
)
print(response.content[0].text)
except anthropic.RateLimitError:
print("Rate limit exceeded even after retries")
except anthropic.InternalServerError:
print("Server error persisted after retries")
except anthropic.APIConnectionError:
print("Could not connect to the API")
except anthropic.BadRequestError as e:
print(f"Bad request (not retried): {e.message}")Context Length Error Recovery
One of the most common errors in production is exceeding the context window. This requires a specific recovery strategy: reduce the input and retry.
import anthropic
client = anthropic.Anthropic()
def call_with_context_recovery(
messages,
system=None,
tools=None,
model="claude-sonnet-4-20250514",
max_tokens=1024,
):
"""
Attempt an API call and recover from context length errors
by progressively reducing the conversation history.
"""
params = {
"model": model,
"max_tokens": max_tokens,
"messages": messages,
}
if system:
params["system"] = system
if tools:
params["tools"] = tools
try:
return client.messages.create(**params)
except anthropic.BadRequestError as e:
if "too long" not in str(e).lower() and "context" not in str(e).lower():
raise # Not a context length error
print("Context too long. Attempting recovery...")
# Strategy 1: Summarize older messages
if len(messages) > 4:
print(" Trying: Summarize older conversation history")
condensed = summarize_conversation(messages, keep_recent=2)
params["messages"] = condensed
try:
return client.messages.create(**params)
except anthropic.BadRequestError:
pass # Still too long
# Strategy 2: Drop older messages entirely
if len(messages) > 2:
print(" Trying: Keep only the last 2 exchanges")
params["messages"] = messages[-4:] # Last 2 user-assistant pairs
try:
return client.messages.create(**params)
except anthropic.BadRequestError:
pass # Still too long
# Strategy 3: Truncate the current message
print(" Trying: Truncate current message")
last_message = messages[-1].copy()
if isinstance(last_message["content"], str):
# Keep only the first ~50% of the message
half_len = len(last_message["content"]) // 2
last_message["content"] = (
last_message["content"][:half_len]
+ "\n\n[Content truncated due to length limits]"
)
params["messages"] = [last_message]
return client.messages.create(**params)Timeout Management
Long-running API calls can hang and block your application. Proper timeout management ensures your system remains responsive even when the API is slow.
import anthropic
import signal
client = anthropic.Anthropic(timeout=30.0) # 30-second timeout
def call_with_timeout(messages, timeout_seconds=30):
"""
Make an API call with a strict timeout.
Falls back to a simpler model or cached response on timeout.
"""
try:
# Use a per-request timeout override
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages,
timeout=timeout_seconds,
)
return {"status": "ok", "response": response.content[0].text}
except anthropic.APITimeoutError:
print(f"Request timed out after {timeout_seconds}s")
# Fallback 1: Try a faster model with reduced output
try:
print("Falling back to faster model...")
fallback_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256, # Shorter response for speed
messages=messages,
timeout=15,
)
return {
"status": "fallback",
"response": fallback_response.content[0].text,
"note": "Response generated by fallback model",
}
except (anthropic.APITimeoutError, anthropic.APIError):
pass
# Fallback 2: Return a cached or default response
return {
"status": "degraded",
"response": (
"I'm experiencing high latency right now. "
"Please try again in a moment."
),
}Circuit Breaker Pattern
The circuit breaker pattern prevents a failing API from being hammered with requests. After a threshold of consecutive failures, the circuit "opens" and requests are immediately rejected without calling the API, giving the service time to recover.
import time
import anthropic
client = anthropic.Anthropic()
class CircuitBreaker:
"""
Circuit breaker for API calls.
States:
- CLOSED: Normal operation, requests pass through
- OPEN: Too many failures, requests are rejected immediately
- HALF_OPEN: Testing if the service has recovered
"""
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
def __init__(
self,
failure_threshold=5,
recovery_timeout=60,
half_open_max_calls=2,
):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.half_open_max_calls = half_open_max_calls
self.state = self.CLOSED
self.failure_count = 0
self.last_failure_time = 0
self.half_open_calls = 0
def can_execute(self):
"""Check if a request is allowed through."""
if self.state == self.CLOSED:
return True
if self.state == self.OPEN:
# Check if recovery timeout has elapsed
if time.time() - self.last_failure_time >= self.recovery_timeout:
self.state = self.HALF_OPEN
self.half_open_calls = 0
print("Circuit breaker: HALF_OPEN (testing recovery)")
return True
return False
if self.state == self.HALF_OPEN:
return self.half_open_calls < self.half_open_max_calls
return False
def record_success(self):
"""Record a successful API call."""
if self.state == self.HALF_OPEN:
self.half_open_calls += 1
if self.half_open_calls >= self.half_open_max_calls:
self.state = self.CLOSED
self.failure_count = 0
print("Circuit breaker: CLOSED (service recovered)")
else:
self.failure_count = 0
def record_failure(self):
"""Record a failed API call."""
self.failure_count += 1
self.last_failure_time = time.time()
if self.state == self.HALF_OPEN:
self.state = self.OPEN
print("Circuit breaker: OPEN (recovery failed)")
elif self.failure_count >= self.failure_threshold:
self.state = self.OPEN
print(
f"Circuit breaker: OPEN "
f"(after {self.failure_count} failures)"
)
# Usage with the circuit breaker
circuit = CircuitBreaker(failure_threshold=5, recovery_timeout=60)
def resilient_call(messages, system=None):
"""Make an API call protected by a circuit breaker."""
if not circuit.can_execute():
return {
"status": "circuit_open",
"response": (
"The service is temporarily unavailable. "
"Please try again later."
),
}
try:
params = {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": messages,
}
if system:
params["system"] = system
response = client.messages.create(**params)
circuit.record_success()
return {"status": "ok", "response": response.content[0].text}
except (anthropic.APIStatusError, anthropic.APIConnectionError) as e:
circuit.record_failure()
return {"status": "error", "response": str(e)}Model Fallback Chain
A robust production system can fall back to alternative models when the primary model is unavailable or overloaded.
import anthropic
client = anthropic.Anthropic()
# Ordered list of models to try, from most capable to most available
MODEL_FALLBACK_CHAIN = [
{"model": "claude-sonnet-4-20250514", "max_tokens": 2048, "timeout": 30},
{"model": "claude-sonnet-4-20250514", "max_tokens": 1024, "timeout": 20},
{"model": "claude-sonnet-4-20250514", "max_tokens": 512, "timeout": 15},
]
def call_with_fallback(messages, system=None):
"""
Try models in order until one succeeds.
Each fallback may use a smaller max_tokens for speed.
"""
errors = []
for config in MODEL_FALLBACK_CHAIN:
try:
params = {
"model": config["model"],
"max_tokens": config["max_tokens"],
"messages": messages,
"timeout": config["timeout"],
}
if system:
params["system"] = system
response = client.messages.create(**params)
return {
"status": "ok",
"model_used": config["model"],
"response": response.content[0].text,
}
except anthropic.APIError as e:
errors.append(f"{config['model']}: {e}")
print(f"Model {config['model']} failed: {e}. Trying next...")
continue
# All models failed
return {
"status": "all_failed",
"errors": errors,
"response": "All models are currently unavailable. Please try again later.",
}Comprehensive Error Handler for Agent Loops
Agent loops need especially robust error handling because a single unhandled error can terminate a multi-step task. Here is a production-grade error handler that combines all the patterns above.
import anthropic
import time
import random
client = anthropic.Anthropic(max_retries=2)
class AgentErrorHandler:
"""
Comprehensive error handler for agentic systems.
Handles all API error types with appropriate strategies.
"""
def __init__(self):
self.circuit_breaker = CircuitBreaker(
failure_threshold=5,
recovery_timeout=60,
)
self.consecutive_errors = 0
self.max_consecutive_errors = 10
def execute_with_recovery(self, call_fn, fallback_fn=None):
"""
Execute an API call with full error recovery.
Args:
call_fn: Callable that makes the API call
fallback_fn: Optional callable for degraded response
Returns:
API response or fallback response
"""
# Check circuit breaker
if not self.circuit_breaker.can_execute():
if fallback_fn:
return fallback_fn()
raise RuntimeError("Circuit breaker is open")
# Check consecutive error limit (prevents infinite agent loops)
if self.consecutive_errors >= self.max_consecutive_errors:
raise RuntimeError(
f"Agent halted: {self.consecutive_errors} consecutive errors"
)
try:
result = call_fn()
self.circuit_breaker.record_success()
self.consecutive_errors = 0
return result
except anthropic.BadRequestError as e:
# 400: Fix the request, don't retry
self.consecutive_errors += 1
if "too long" in str(e).lower():
raise ContextTooLongError(str(e)) from e
raise
except anthropic.RateLimitError as e:
# 429: Wait and retry
self.consecutive_errors += 1
retry_after = 60 # Default
if hasattr(e, "response") and e.response is not None:
header = e.response.headers.get("retry-after")
if header:
retry_after = float(header)
print(f"Rate limited. Waiting {retry_after}s...")
time.sleep(retry_after)
return call_fn() # One more attempt
except anthropic.InternalServerError:
# 500: Record failure, use fallback
self.consecutive_errors += 1
self.circuit_breaker.record_failure()
if fallback_fn:
return fallback_fn()
raise
except anthropic.APIConnectionError:
# Network error: Record failure, use fallback
self.consecutive_errors += 1
self.circuit_breaker.record_failure()
if fallback_fn:
return fallback_fn()
raise
class ContextTooLongError(Exception):
"""Raised when the context exceeds the model's limit."""
passExam Tip: The exam distinguishes between retryable and non-retryable errors. Remember: 429 (rate limit), 500 (server error), and 529 (overloaded) are retryable. 400 (bad request), 401 (auth), and 403 (permission) are NOT retryable — the request itself is wrong and retrying will produce the same error. A common wrong answer retries on a 400 error.
Exam Tip: Exponential backoff with jitter is the standard retry pattern. "Jitter" means adding randomness to the delay so that multiple clients that fail at the same time do not all retry at the same instant (thundering herd problem). The formula is: delay = min(base_delay * 2^attempt * random(0.5, 1.0), max_delay).
Exam Tip: The circuit breaker pattern has three states: CLOSED (normal), OPEN (rejecting all calls), and HALF_OPEN (testing recovery with a few calls). The exam may ask you to identify which state a circuit breaker is in given a scenario, or to explain why a circuit breaker is preferable to simple retry logic for a service experiencing persistent failures.
Key Takeaways
Classify errors before handling them. Retryable errors (429, 500, 529) should be retried with exponential backoff and jitter. Non-retryable errors (400, 401, 403) should be surfaced immediately.
The Anthropic SDK has built-in retry logic that handles common cases. Configure max_retries and timeout at the client level for basic resilience without custom code.
Context length errors require progressive reduction — summarize history, drop old messages, or truncate the current input. This is a recovery strategy, not a retry strategy.
Circuit breakers prevent cascading failures by stopping requests to a failing service, giving it time to recover. They transition through CLOSED, OPEN, and HALF_OPEN states based on success/failure counts and timeouts.
Agent loops need a consecutive error limit to prevent infinite retry loops. If an agent encounters the same error repeatedly, it should halt and escalate rather than burn through API credits.