✍️Prompt EngineeringLesson 4.2

Structured Output Enforcement

JSON output, prefilling, schema validation, and retry loops.

25 min

Learning Objectives

Enforce JSON output with prefilling technique
Validate responses against JSON schemas
Implement validation retry loops

Structured Output Enforcement

Production systems rarely consume Claude's output as free-form text. Downstream code needs to parse, validate, and act on the output programmatically. This lesson covers the techniques and patterns for reliably extracting structured data — primarily JSON — from Claude's responses. Mastering these patterns is essential because a system that returns valid JSON 95% of the time is a system that fails 5% of the time in production.

Requesting JSON Output

The most direct approach to getting structured output is to explicitly ask for it. Claude is highly capable of producing valid JSON when the request is clear and the expected schema is specified.

Basic JSON Request

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": (
        "Extract the following information from this customer email "
        "and return it as JSON.\n\n"
        "<email>\n"
        "Hi, my name is Sarah Chen and I purchased order #A1234 on "
        "March 5th. The shipping address should be updated to "
        "456 Oak Avenue, Portland, OR 97201. My phone number is "
        "503-555-0147. Thanks!\n"
        "</email>\n\n"
        "<output_schema>\n"
        "{\n"
        "  \"customer_name\": \"string\",\n"
        "  \"order_id\": \"string\",\n"
        "  \"request_type\": \"string\",\n"
        "  \"details\": {\n"
        "    \"new_address\": \"string\",\n"
        "    \"phone\": \"string\"\n"
        "  }\n"
        "}\n"
        "</output_schema>\n\n"
        "Return ONLY the JSON object. No additional text or explanation."
    )}]
)

Prefilling: The Most Powerful Technique

Prefilling is a technique unique to the Anthropic API that gives you precise control over the start of Claude's response. By including an assistant message at the end of the messages array, you force Claude to continue from that exact point. This is the single most reliable technique for ensuring structured output.

How Prefilling Works

When you include a partial assistant message, Claude treats it as the beginning of its own response and continues from there. This means you can force Claude to start with an opening brace, an XML tag, or any other token that constrains the output format.

import anthropic
import json

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": (
            "Analyze the sentiment of this review and return a JSON "
            "object with \"sentiment\" (positive/negative/neutral), "
            "\"confidence\" (0-1), and \"key_phrases\" (list of strings).\n\n"
            "<review>\n"
            "The food was absolutely incredible but the service was "
            "painfully slow. Would come back for the food alone.\n"
            "</review>"
        )},
        {"role": "assistant", "content": "{"}
    ]
)

# The response continues from "{" — prepend it back
json_str = "{" + response.content[0].text
result = json.loads(json_str)
print(result)

Exam Tip: Prefilling is a frequently tested concept on the CCA-F exam. You must know that: (1) prefilling uses an assistant role message at the end of the messages array, (2) the model continues from the prefilled text, (3) you must prepend the prefilled text back to the response to reconstruct the full output, and (4) prefilling is not compatible with Extended Thinking mode.

Advanced Prefilling Patterns

Prefilling is not limited to a single character. You can prefill entire structures:

import anthropic

client = anthropic.Anthropic()

# Prefill to force a specific JSON structure
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": (
            "Extract all entities from the following text.\n\n"
            "<text>Apple CEO Tim Cook announced the new iPhone 16 "
            "at the Cupertino headquarters on September 9, 2024.</text>"
        )},
        {"role": "assistant", "content": (
            "{\n  \"entities\": ["
        )}
    ]
)

# Reconstruct the full JSON
full_json = "{\n  \"entities\": [" + response.content[0].text

Prefilling for Non-JSON Formats

Prefilling works for any output format, not just JSON:

import anthropic

client = anthropic.Anthropic()

# Force XML output
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "List the top 3 risks of this project plan."},
        {"role": "assistant", "content": "<risks>\n<risk>"}
    ]
)

# Force a specific classification label
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=10,
    messages=[
        {"role": "user", "content": (
            "Classify this support ticket as one of: BILLING, TECHNICAL, GENERAL.\n\n"
            "Ticket: I can\'t log into my account after the password reset."
        )},
        {"role": "assistant", "content": "Classification: "}
    ]
)

Schema Validation

Even with prefilling, you should never trust that Claude's output perfectly matches your expected schema. Production systems must validate. The most robust approach uses a validation library like Pydantic.

import anthropic
import json
from pydantic import BaseModel, Field
from typing import List, Optional


class ExtractedEntity(BaseModel):
    name: str
    entity_type: str = Field(description="PERSON, ORG, LOCATION, DATE, PRODUCT")
    confidence: float = Field(ge=0.0, le=1.0)


class ExtractionResult(BaseModel):
    entities: List[ExtractedEntity]
    summary: str
    language: Optional[str] = None


def extract_entities(text: str) -> ExtractionResult:
    client = anthropic.Anthropic()

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        messages=[
            {"role": "user", "content": (
                f"Extract all named entities from the following text. "
                f"Return a JSON object matching this schema:\n\n"
                f"{ExtractionResult.model_json_schema()}\n\n"
                f"<text>\n{text}\n</text>"
            )},
            {"role": "assistant", "content": "{"}
        ]
    )

    json_str = "{" + response.content[0].text
    data = json.loads(json_str)
    return ExtractionResult.model_validate(data)

Retry Loops for Robustness

When JSON parsing or schema validation fails, a well-designed system retries with error feedback rather than crashing. The retry loop sends the validation error back to Claude so it can correct its output.

import anthropic
import json
from pydantic import BaseModel, ValidationError
from typing import List
import time


class AnalysisResult(BaseModel):
    category: str
    severity: str
    findings: List[str]
    recommendation: str


def extract_with_retry(
    prompt: str,
    max_retries: int = 3,
    model: str = "claude-sonnet-4-20250514"
) -> AnalysisResult:
    client = anthropic.Anthropic()
    messages = [
        {"role": "user", "content": prompt},
        {"role": "assistant", "content": "{"}
    ]

    for attempt in range(max_retries):
        response = client.messages.create(
            model=model,
            max_tokens=2048,
            messages=messages
        )

        json_str = "{" + response.content[0].text

        try:
            data = json.loads(json_str)
            return AnalysisResult.model_validate(data)
        except (json.JSONDecodeError, ValidationError) as e:
            if attempt == max_retries - 1:
                raise
            # Feed the error back to Claude for correction
            messages = [
                {"role": "user", "content": prompt},
                {"role": "assistant", "content": json_str},
                {"role": "user", "content": (
                    f"The JSON you returned failed validation:\n"
                    f"{str(e)}\n\n"
                    f"Please return corrected JSON that matches the "
                    f"required schema. Return ONLY the JSON object."
                )},
                {"role": "assistant", "content": "{"}
            ]

    raise RuntimeError("Should not reach here")

Exam Tip: The exam expects you to know the retry pattern. Key points: (1) always set a maximum retry count to avoid infinite loops and runaway costs, (2) include the validation error in the retry prompt so Claude knows what to fix, (3) use prefilling in the retry as well to maintain format constraints, and (4) consider using exponential backoff for rate-limited scenarios.

Tool Use for Guaranteed Structure

Claude's tool use feature provides another path to structured output. When you define a tool with a JSON schema, Claude is constrained to produce output that matches that schema when it decides to use the tool. This is sometimes called “forced function calling” when combined with tool_choice.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{
        "name": "record_analysis",
        "description": "Record the results of the text analysis",
        "input_schema": {
            "type": "object",
            "properties": {
                "sentiment": {
                    "type": "string",
                    "enum": ["positive", "negative", "neutral", "mixed"]
                },
                "confidence": {
                    "type": "number",
                    "minimum": 0,
                    "maximum": 1
                },
                "topics": {
                    "type": "array",
                    "items": {"type": "string"}
                },
                "summary": {"type": "string"}
            },
            "required": ["sentiment", "confidence", "topics", "summary"]
        }
    }],
    tool_choice={"type": "tool", "name": "record_analysis"},
    messages=[{"role": "user", "content": (
        "Analyze the following customer review:\n\n"
        "<review>Great product, fast shipping, but packaging was damaged.</review>"
    )}]
)

# Extract the structured tool input
tool_use = next(b for b in response.content if b.type == "tool_use")
result = tool_use.input
print(result)  # Guaranteed to match the schema

Key Takeaway: There are three main approaches to structured output: (1) explicit instructions with prefilling for simplicity and speed, (2) schema validation with retry loops for robustness, and (3) tool use with forced function calling for guaranteed schema compliance. Choose based on your reliability requirements and latency budget. For most production systems, combining prefilling with validation and a retry loop provides the best balance of reliability and performance.