🛡️Context & ReliabilityLesson 5.5

Error Recovery & Graceful Degradation

Handling API errors, timeouts, rate limits, and unexpected outputs.

20 min

Learning Objectives

  • Design error recovery strategies
  • Implement retry logic with backoff
  • Build graceful degradation mechanisms

Error Recovery and Graceful Degradation

Production systems built on the Anthropic API must handle a variety of failure modes gracefully. API calls can fail due to rate limits, server errors, network timeouts, context length violations, and content policy blocks. A well-designed system anticipates these failures, retries intelligently, and degrades gracefully rather than crashing. This lesson covers the practical implementation of retry logic, error classification, fallback strategies, and circuit breaker patterns.

Understanding API Error Types

The Anthropic API returns specific error types that require different handling strategies:

  • 400 Bad Request (invalid_request_error): The request is malformed — bad parameters, invalid model name, or context too long. Do NOT retry; fix the request.
  • 401 Unauthorized (authentication_error): Invalid API key. Do NOT retry; fix credentials.
  • 403 Forbidden (permission_error): The API key does not have permission for this operation. Do NOT retry.
  • 429 Too Many Requests (rate_limit_error): You have exceeded your rate limit. Retry after the Retry-After header duration.
  • 500 Internal Server Error (api_error): An unexpected server-side error. Retry with exponential backoff.
  • 529 Overloaded (overloaded_error): The API is under heavy load. Retry with exponential backoff and longer initial delay.

Retry Logic with Exponential Backoff

The fundamental retry pattern uses exponential backoff with jitter to avoid thundering herd problems when many clients retry simultaneously.

import anthropic
import time
import random

client = anthropic.Anthropic()

def call_with_retry(
    messages,
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=None,
    max_retries=5,
    initial_delay=1.0,
    max_delay=60.0,
    backoff_factor=2.0,
):
    """
    Make an API call with intelligent retry logic.

    Uses exponential backoff with jitter for retryable errors.
    Immediately raises non-retryable errors.
    """
    retryable_status_codes = {429, 500, 529}

    params = {
        "model": model,
        "max_tokens": max_tokens,
        "messages": messages,
    }
    if system:
        params["system"] = system

    last_exception = None
    delay = initial_delay

    for attempt in range(max_retries + 1):
        try:
            response = client.messages.create(**params)
            return response

        except anthropic.APIStatusError as e:
            last_exception = e

            # Non-retryable errors: raise immediately
            if e.status_code not in retryable_status_codes:
                print(f"Non-retryable error (HTTP {e.status_code}): {e.message}")
                raise

            # Rate limit: use Retry-After header if available
            if e.status_code == 429:
                retry_after = None
                if hasattr(e, "response") and e.response is not None:
                    retry_after = e.response.headers.get("retry-after")
                if retry_after:
                    delay = float(retry_after)
                    print(f"Rate limited. Waiting {delay}s (from Retry-After)")
                else:
                    print(f"Rate limited. Waiting {delay}s (exponential backoff)")
            else:
                print(
                    f"Server error (HTTP {e.status_code}), "
                    f"attempt {attempt + 1}/{max_retries + 1}. "
                    f"Waiting {delay:.1f}s"
                )

            if attempt < max_retries:
                # Add jitter: randomize between 50% and 100% of the delay
                jittered_delay = delay * (0.5 + random.random() * 0.5)
                time.sleep(jittered_delay)
                # Increase delay for next attempt
                delay = min(delay * backoff_factor, max_delay)

        except anthropic.APIConnectionError as e:
            last_exception = e
            print(
                f"Connection error, attempt {attempt + 1}/{max_retries + 1}: {e}"
            )
            if attempt < max_retries:
                jittered_delay = delay * (0.5 + random.random() * 0.5)
                time.sleep(jittered_delay)
                delay = min(delay * backoff_factor, max_delay)

    # All retries exhausted
    raise last_exception

Using the SDK's Built-in Retry

The Anthropic Python SDK includes built-in retry logic that handles the common cases. For many applications, configuring the SDK's retry behavior is sufficient.

import anthropic

# Configure retry behavior at the client level
client = anthropic.Anthropic(
    max_retries=3,       # Number of retries (default: 2)
    timeout=60.0,        # Request timeout in seconds (default: 600)
)

# The SDK automatically retries on:
# - 429 Rate Limit (with Retry-After header)
# - 500 Internal Server Error
# - 529 Overloaded
# - Connection errors

try:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello, Claude!"}],
    )
    print(response.content[0].text)
except anthropic.RateLimitError:
    print("Rate limit exceeded even after retries")
except anthropic.InternalServerError:
    print("Server error persisted after retries")
except anthropic.APIConnectionError:
    print("Could not connect to the API")
except anthropic.BadRequestError as e:
    print(f"Bad request (not retried): {e.message}")

Context Length Error Recovery

One of the most common errors in production is exceeding the context window. This requires a specific recovery strategy: reduce the input and retry.

import anthropic

client = anthropic.Anthropic()

def call_with_context_recovery(
    messages,
    system=None,
    tools=None,
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
):
    """
    Attempt an API call and recover from context length errors
    by progressively reducing the conversation history.
    """
    params = {
        "model": model,
        "max_tokens": max_tokens,
        "messages": messages,
    }
    if system:
        params["system"] = system
    if tools:
        params["tools"] = tools

    try:
        return client.messages.create(**params)
    except anthropic.BadRequestError as e:
        if "too long" not in str(e).lower() and "context" not in str(e).lower():
            raise  # Not a context length error

        print("Context too long. Attempting recovery...")

    # Strategy 1: Summarize older messages
    if len(messages) > 4:
        print("  Trying: Summarize older conversation history")
        condensed = summarize_conversation(messages, keep_recent=2)
        params["messages"] = condensed
        try:
            return client.messages.create(**params)
        except anthropic.BadRequestError:
            pass  # Still too long

    # Strategy 2: Drop older messages entirely
    if len(messages) > 2:
        print("  Trying: Keep only the last 2 exchanges")
        params["messages"] = messages[-4:]  # Last 2 user-assistant pairs
        try:
            return client.messages.create(**params)
        except anthropic.BadRequestError:
            pass  # Still too long

    # Strategy 3: Truncate the current message
    print("  Trying: Truncate current message")
    last_message = messages[-1].copy()
    if isinstance(last_message["content"], str):
        # Keep only the first ~50% of the message
        half_len = len(last_message["content"]) // 2
        last_message["content"] = (
            last_message["content"][:half_len]
            + "\n\n[Content truncated due to length limits]"
        )
    params["messages"] = [last_message]
    return client.messages.create(**params)

Timeout Management

Long-running API calls can hang and block your application. Proper timeout management ensures your system remains responsive even when the API is slow.

import anthropic
import signal

client = anthropic.Anthropic(timeout=30.0)  # 30-second timeout

def call_with_timeout(messages, timeout_seconds=30):
    """
    Make an API call with a strict timeout.
    Falls back to a simpler model or cached response on timeout.
    """
    try:
        # Use a per-request timeout override
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=messages,
            timeout=timeout_seconds,
        )
        return {"status": "ok", "response": response.content[0].text}

    except anthropic.APITimeoutError:
        print(f"Request timed out after {timeout_seconds}s")

        # Fallback 1: Try a faster model with reduced output
        try:
            print("Falling back to faster model...")
            fallback_response = client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=256,  # Shorter response for speed
                messages=messages,
                timeout=15,
            )
            return {
                "status": "fallback",
                "response": fallback_response.content[0].text,
                "note": "Response generated by fallback model",
            }
        except (anthropic.APITimeoutError, anthropic.APIError):
            pass

        # Fallback 2: Return a cached or default response
        return {
            "status": "degraded",
            "response": (
                "I'm experiencing high latency right now. "
                "Please try again in a moment."
            ),
        }

Circuit Breaker Pattern

The circuit breaker pattern prevents a failing API from being hammered with requests. After a threshold of consecutive failures, the circuit "opens" and requests are immediately rejected without calling the API, giving the service time to recover.

import time
import anthropic

client = anthropic.Anthropic()

class CircuitBreaker:
    """
    Circuit breaker for API calls.

    States:
    - CLOSED: Normal operation, requests pass through
    - OPEN: Too many failures, requests are rejected immediately
    - HALF_OPEN: Testing if the service has recovered
    """
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

    def __init__(
        self,
        failure_threshold=5,
        recovery_timeout=60,
        half_open_max_calls=2,
    ):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.half_open_max_calls = half_open_max_calls

        self.state = self.CLOSED
        self.failure_count = 0
        self.last_failure_time = 0
        self.half_open_calls = 0

    def can_execute(self):
        """Check if a request is allowed through."""
        if self.state == self.CLOSED:
            return True

        if self.state == self.OPEN:
            # Check if recovery timeout has elapsed
            if time.time() - self.last_failure_time >= self.recovery_timeout:
                self.state = self.HALF_OPEN
                self.half_open_calls = 0
                print("Circuit breaker: HALF_OPEN (testing recovery)")
                return True
            return False

        if self.state == self.HALF_OPEN:
            return self.half_open_calls < self.half_open_max_calls

        return False

    def record_success(self):
        """Record a successful API call."""
        if self.state == self.HALF_OPEN:
            self.half_open_calls += 1
            if self.half_open_calls >= self.half_open_max_calls:
                self.state = self.CLOSED
                self.failure_count = 0
                print("Circuit breaker: CLOSED (service recovered)")
        else:
            self.failure_count = 0

    def record_failure(self):
        """Record a failed API call."""
        self.failure_count += 1
        self.last_failure_time = time.time()

        if self.state == self.HALF_OPEN:
            self.state = self.OPEN
            print("Circuit breaker: OPEN (recovery failed)")
        elif self.failure_count >= self.failure_threshold:
            self.state = self.OPEN
            print(
                f"Circuit breaker: OPEN "
                f"(after {self.failure_count} failures)"
            )


# Usage with the circuit breaker
circuit = CircuitBreaker(failure_threshold=5, recovery_timeout=60)

def resilient_call(messages, system=None):
    """Make an API call protected by a circuit breaker."""
    if not circuit.can_execute():
        return {
            "status": "circuit_open",
            "response": (
                "The service is temporarily unavailable. "
                "Please try again later."
            ),
        }

    try:
        params = {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 1024,
            "messages": messages,
        }
        if system:
            params["system"] = system

        response = client.messages.create(**params)
        circuit.record_success()
        return {"status": "ok", "response": response.content[0].text}

    except (anthropic.APIStatusError, anthropic.APIConnectionError) as e:
        circuit.record_failure()
        return {"status": "error", "response": str(e)}

Model Fallback Chain

A robust production system can fall back to alternative models when the primary model is unavailable or overloaded.

import anthropic

client = anthropic.Anthropic()

# Ordered list of models to try, from most capable to most available
MODEL_FALLBACK_CHAIN = [
    {"model": "claude-sonnet-4-20250514", "max_tokens": 2048, "timeout": 30},
    {"model": "claude-sonnet-4-20250514", "max_tokens": 1024, "timeout": 20},
    {"model": "claude-sonnet-4-20250514", "max_tokens": 512, "timeout": 15},
]

def call_with_fallback(messages, system=None):
    """
    Try models in order until one succeeds.
    Each fallback may use a smaller max_tokens for speed.
    """
    errors = []

    for config in MODEL_FALLBACK_CHAIN:
        try:
            params = {
                "model": config["model"],
                "max_tokens": config["max_tokens"],
                "messages": messages,
                "timeout": config["timeout"],
            }
            if system:
                params["system"] = system

            response = client.messages.create(**params)
            return {
                "status": "ok",
                "model_used": config["model"],
                "response": response.content[0].text,
            }

        except anthropic.APIError as e:
            errors.append(f"{config['model']}: {e}")
            print(f"Model {config['model']} failed: {e}. Trying next...")
            continue

    # All models failed
    return {
        "status": "all_failed",
        "errors": errors,
        "response": "All models are currently unavailable. Please try again later.",
    }

Comprehensive Error Handler for Agent Loops

Agent loops need especially robust error handling because a single unhandled error can terminate a multi-step task. Here is a production-grade error handler that combines all the patterns above.

import anthropic
import time
import random

client = anthropic.Anthropic(max_retries=2)

class AgentErrorHandler:
    """
    Comprehensive error handler for agentic systems.
    Handles all API error types with appropriate strategies.
    """

    def __init__(self):
        self.circuit_breaker = CircuitBreaker(
            failure_threshold=5,
            recovery_timeout=60,
        )
        self.consecutive_errors = 0
        self.max_consecutive_errors = 10

    def execute_with_recovery(self, call_fn, fallback_fn=None):
        """
        Execute an API call with full error recovery.

        Args:
            call_fn: Callable that makes the API call
            fallback_fn: Optional callable for degraded response
        Returns:
            API response or fallback response
        """
        # Check circuit breaker
        if not self.circuit_breaker.can_execute():
            if fallback_fn:
                return fallback_fn()
            raise RuntimeError("Circuit breaker is open")

        # Check consecutive error limit (prevents infinite agent loops)
        if self.consecutive_errors >= self.max_consecutive_errors:
            raise RuntimeError(
                f"Agent halted: {self.consecutive_errors} consecutive errors"
            )

        try:
            result = call_fn()
            self.circuit_breaker.record_success()
            self.consecutive_errors = 0
            return result

        except anthropic.BadRequestError as e:
            # 400: Fix the request, don't retry
            self.consecutive_errors += 1
            if "too long" in str(e).lower():
                raise ContextTooLongError(str(e)) from e
            raise

        except anthropic.RateLimitError as e:
            # 429: Wait and retry
            self.consecutive_errors += 1
            retry_after = 60  # Default
            if hasattr(e, "response") and e.response is not None:
                header = e.response.headers.get("retry-after")
                if header:
                    retry_after = float(header)
            print(f"Rate limited. Waiting {retry_after}s...")
            time.sleep(retry_after)
            return call_fn()  # One more attempt

        except anthropic.InternalServerError:
            # 500: Record failure, use fallback
            self.consecutive_errors += 1
            self.circuit_breaker.record_failure()
            if fallback_fn:
                return fallback_fn()
            raise

        except anthropic.APIConnectionError:
            # Network error: Record failure, use fallback
            self.consecutive_errors += 1
            self.circuit_breaker.record_failure()
            if fallback_fn:
                return fallback_fn()
            raise


class ContextTooLongError(Exception):
    """Raised when the context exceeds the model's limit."""
    pass

Exam Tip: The exam distinguishes between retryable and non-retryable errors. Remember: 429 (rate limit), 500 (server error), and 529 (overloaded) are retryable. 400 (bad request), 401 (auth), and 403 (permission) are NOT retryable — the request itself is wrong and retrying will produce the same error. A common wrong answer retries on a 400 error.

Exam Tip: Exponential backoff with jitter is the standard retry pattern. "Jitter" means adding randomness to the delay so that multiple clients that fail at the same time do not all retry at the same instant (thundering herd problem). The formula is: delay = min(base_delay * 2^attempt * random(0.5, 1.0), max_delay).

Exam Tip: The circuit breaker pattern has three states: CLOSED (normal), OPEN (rejecting all calls), and HALF_OPEN (testing recovery with a few calls). The exam may ask you to identify which state a circuit breaker is in given a scenario, or to explain why a circuit breaker is preferable to simple retry logic for a service experiencing persistent failures.

Key Takeaways

Classify errors before handling them. Retryable errors (429, 500, 529) should be retried with exponential backoff and jitter. Non-retryable errors (400, 401, 403) should be surfaced immediately.

The Anthropic SDK has built-in retry logic that handles common cases. Configure max_retries and timeout at the client level for basic resilience without custom code.

Context length errors require progressive reduction — summarize history, drop old messages, or truncate the current input. This is a recovery strategy, not a retry strategy.

Circuit breakers prevent cascading failures by stopping requests to a failing service, giving it time to recover. They transition through CLOSED, OPEN, and HALF_OPEN states based on success/failure counts and timeouts.

Agent loops need a consecutive error limit to prevent infinite retry loops. If an agent encounters the same error repeatedly, it should halt and escalate rather than burn through API credits.