🔧Tool Design & MCPLesson 2.2

Tool Design Best Practices

Writing effective tool descriptions, scoping, and error handling.

20 min

Learning Objectives

  • Write clear, unambiguous tool descriptions
  • Scope tools to prevent cognitive overload
  • Implement graceful error handling for tools

Tool Design Best Practices

The quality of your tool definitions directly determines how effectively Claude can use them. Poorly designed tools lead to incorrect invocations, wasted tokens, and frustrated users. Well-designed tools feel invisible -- Claude uses them correctly without confusion or unnecessary retries. This lesson covers the principles that separate production-grade tool design from naive implementations.

Writing Effective Tool Descriptions

The description field is the single most important part of a tool definition. Claude relies heavily on descriptions to determine when and how to use each tool. A good description answers three questions: what does this tool do, when should it be used, and what does it return?

Description Anatomy

# BAD: Vague, unhelpful description
{
    "name": "search",
    "description": "Searches for stuff",
    "input_schema": {
        "type": "object",
        "properties": {
            "q": {"type": "string"}
        },
        "required": ["q"]
    }
}

# GOOD: Clear, specific, actionable description
{
    "name": "search_knowledge_base",
    "description": "Search the internal knowledge base for company documentation, policies, and procedures. Returns up to 10 relevant document snippets ranked by relevance. Use this tool when the user asks questions about company processes, HR policies, technical documentation, or internal procedures. Do NOT use this for general knowledge questions that don't relate to company-specific information.",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Natural language search query. Be specific and include key terms related to the topic."
            },
            "department": {
                "type": "string",
                "enum": ["engineering", "hr", "finance", "legal", "all"],
                "description": "Filter results by department. Use 'all' to search across all departments."
            },
            "max_results": {
                "type": "integer",
                "description": "Maximum number of results to return. Defaults to 5.",
                "default": 5
            }
        },
        "required": ["query"]
    }
}
Exam Tip: Anthropic's documentation explicitly states that tool descriptions should include: (1) what the tool does, (2) when to use it and when NOT to use it, and (3) what the return value looks like. The exam tests whether you can identify well-formed vs. poorly-formed tool descriptions.

The Three-Part Description Formula

Follow this structure for every tool description:

  • What it does: A clear statement of the tool's functionality. Example: “Search the internal knowledge base for company documentation.”
  • When to use it: Specific scenarios where the tool should be invoked, plus negative guidance for when NOT to use it. Example: “Use when the user asks about company policies. Do NOT use for general knowledge.”
  • What it returns: A description of the output format and content. Example: “Returns up to 10 document snippets as JSON with title, content, and relevance score.”

Parameter Description Guidelines

Each parameter should have its own clear description. Include format expectations, valid ranges, default values, and examples where helpful.

# Detailed parameter descriptions
{
    "name": "create_calendar_event",
    "description": "Create a new calendar event. Returns the event ID and confirmation details.",
    "input_schema": {
        "type": "object",
        "properties": {
            "title": {
                "type": "string",
                "description": "Event title. Keep it concise (under 100 characters)."
            },
            "start_time": {
                "type": "string",
                "description": "Event start time in ISO 8601 format, e.g. '2024-03-15T14:00:00Z'. Must be in the future."
            },
            "duration_minutes": {
                "type": "integer",
                "description": "Event duration in minutes. Must be between 15 and 480 (8 hours).",
                "minimum": 15,
                "maximum": 480
            },
            "attendees": {
                "type": "array",
                "items": {"type": "string"},
                "description": "List of attendee email addresses. Each must be a valid email."
            },
            "priority": {
                "type": "string",
                "enum": ["low", "normal", "high"],
                "description": "Event priority level. Defaults to 'normal' if not specified."
            }
        },
        "required": ["title", "start_time", "duration_minutes"]
    }
}

Tool Naming Conventions

Tool names should be descriptive, use consistent casing (snake_case is standard), and follow a verb_noun pattern that makes the tool's purpose immediately clear.

  • Good names: search_knowledge_base, create_ticket, get_user_profile, send_email, list_recent_orders
  • Bad names: search, do_thing, helper, process, tool1

When you have multiple related tools, use a consistent prefix to group them:

  • db_query, db_insert, db_update, db_delete
  • file_read, file_write, file_list, file_delete
  • user_get, user_create, user_update

Scoping Tools Appropriately

A well-scoped tool does one thing and does it well. Avoid creating “god tools” that try to handle too many different operations, but also avoid splitting functionality so finely that Claude needs excessive tool calls to accomplish basic tasks.

Over-Scoped Tool (Bad)

# BAD: One tool tries to do everything
{
    "name": "database",
    "description": "Perform any database operation",
    "input_schema": {
        "type": "object",
        "properties": {
            "operation": {"type": "string", "enum": ["select", "insert", "update", "delete"]},
            "table": {"type": "string"},
            "data": {"type": "object"},
            "where": {"type": "object"},
            "columns": {"type": "array", "items": {"type": "string"}}
        },
        "required": ["operation", "table"]
    }
}

Well-Scoped Tools (Good)

# GOOD: Separate tools for distinct operations
tools = [
    {
        "name": "lookup_customer",
        "description": "Look up a customer by ID or email. Returns customer profile including name, email, account status, and subscription tier.",
        "input_schema": {
            "type": "object",
            "properties": {
                "customer_id": {"type": "string", "description": "Customer ID (e.g., CUST-12345)"},
                "email": {"type": "string", "description": "Customer email address"}
            }
        }
    },
    {
        "name": "list_customer_orders",
        "description": "List recent orders for a customer. Returns order ID, date, total, and status for each order.",
        "input_schema": {
            "type": "object",
            "properties": {
                "customer_id": {"type": "string", "description": "Customer ID"},
                "limit": {"type": "integer", "description": "Max orders to return (default 10)", "default": 10}
            },
            "required": ["customer_id"]
        }
    },
    {
        "name": "update_order_status",
        "description": "Update the status of an existing order. Only permitted transitions: pending->processing, processing->shipped, shipped->delivered.",
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {"type": "string", "description": "Order ID (e.g., ORD-98765)"},
                "new_status": {"type": "string", "enum": ["processing", "shipped", "delivered"]}
            },
            "required": ["order_id", "new_status"]
        }
    }
]
Exam Tip: The exam tests your ability to evaluate tool scoping. A common anti-pattern is creating a single “do everything” tool with an operation parameter. The correct approach is separate, focused tools with clear boundaries. However, avoid the opposite extreme of splitting tools so finely that simple tasks require many calls.

Avoiding Cognitive Overload

Cognitive overload occurs when Claude has too many tools to choose from or when tools are too similar to distinguish. This leads to incorrect tool selection, hallucinated parameters, and general degradation in response quality.

  • Limit active tools to under 20 per API call: Performance is best with 5-10 focused tools. Beyond 20, selection accuracy drops noticeably.
  • Ensure distinct tool purposes: If two tools could be confused by their names and descriptions alone, consolidate them or add disambiguation guidance.
  • Use negative guidance: Tell Claude when NOT to use a tool. This is just as important as telling it when to use it.
  • Group related tools logically: If tools share a domain, use consistent naming prefixes so Claude can reason about them as a group.
# Example of negative guidance in descriptions
{
    "name": "search_internal_docs",
    "description": "Search internal company documentation for policies, procedures, and technical specs. Returns relevant document excerpts. Use this ONLY for company-specific questions. Do NOT use this for general programming questions, public API documentation, or information that would be in Claude's training data.",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Search query"}
        },
        "required": ["query"]
    }
}

Error Handling in Tool Results

How you report errors back to Claude significantly affects the agent's ability to recover gracefully. Always provide structured, actionable error messages.

Error Handling Patterns

def execute_tool(name, input_data):
    """Execute a tool and return a properly formatted result."""
    try:
        if name == "lookup_customer":
            customer = db.find_customer(
                customer_id=input_data.get("customer_id"),
                email=input_data.get("email")
            )
            if customer is None:
                return {
                    "type": "tool_result",
                    "tool_use_id": tool_id,
                    "is_error": True,
                    "content": "No customer found matching the provided criteria. "
                               "Please verify the customer ID or email address."
                }
            return {
                "type": "tool_result",
                "tool_use_id": tool_id,
                "content": json.dumps(customer.to_dict())
            }

    except PermissionError:
        return {
            "type": "tool_result",
            "tool_use_id": tool_id,
            "is_error": True,
            "content": "Permission denied: You do not have access to this resource. "
                       "The user may need elevated permissions."
        }
    except RateLimitError:
        return {
            "type": "tool_result",
            "tool_use_id": tool_id,
            "is_error": True,
            "content": "Rate limit exceeded. Please wait a moment before retrying."
        }
    except Exception as e:
        return {
            "type": "tool_result",
            "tool_use_id": tool_id,
            "is_error": True,
            "content": f"Unexpected error: {str(e)}. Please try again or use an alternative approach."
        }

Using Enums and Constraints

JSON Schema constraints in your input schema help Claude generate valid inputs and reduce errors. Use enum for fixed value sets, minimum/maximum for numeric ranges, and pattern for string formats.

{
    "name": "query_logs",
    "description": "Query application logs by severity level and time range.",
    "input_schema": {
        "type": "object",
        "properties": {
            "severity": {
                "type": "string",
                "enum": ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
                "description": "Minimum severity level to include in results"
            },
            "hours_back": {
                "type": "integer",
                "minimum": 1,
                "maximum": 168,
                "description": "How many hours back to search (1 to 168, i.e., up to 7 days)"
            },
            "service_name": {
                "type": "string",
                "pattern": "^[a-z][a-z0-9-]{2,30}$",
                "description": "Service name in kebab-case, e.g. 'user-auth', 'payment-processor'"
            }
        },
        "required": ["severity", "hours_back"]
    }
}

Structuring Tool Results for Clarity

The format of your tool results affects how well Claude can interpret and use the data. Return structured, consistent results that are easy for the model to parse.

# BAD: Unstructured, ambiguous result
"Found 3 customers: John Smith john@example.com active, Jane Doe jane@example.com inactive, Bob Jones bob@test.com active"

# GOOD: Structured, parseable result
json.dumps({
    "total_results": 3,
    "customers": [
        {"name": "John Smith", "email": "john@example.com", "status": "active"},
        {"name": "Jane Doe", "email": "jane@example.com", "status": "inactive"},
        {"name": "Bob Jones", "email": "bob@test.com", "status": "active"}
    ]
})

Idempotency and Side Effects

When designing tools for agentic systems, consider whether the tool has side effects (writes data, sends emails, charges credit cards) vs. being read-only. This distinction matters for safety and retry logic.

  • Read-only tools (search, lookup, list) are safe to retry and call multiple times. Claude can freely use these without concern about unintended consequences.
  • Write tools (create, update, delete, send) should be clearly marked in their descriptions. Consider requiring confirmation for destructive operations.
  • Idempotent tools produce the same result regardless of how many times they are called with the same input. Design write tools to be idempotent where possible (e.g., “set status to X” rather than “increment counter”).
Exam Tip: The exam emphasizes the importance of distinguishing read vs. write tools in agentic systems. Anthropic recommends that agents prefer reversible actions over irreversible ones, and that destructive operations should include confirmation mechanisms. Idempotency is a key concept for safe tool design.

Testing Tool Definitions

Before deploying tools to production, validate that Claude uses them correctly across a range of inputs. Key testing strategies include:

  • Happy path testing: Verify Claude correctly invokes the tool with valid inputs for typical use cases.
  • Edge case testing: Test with ambiguous prompts, missing information, and boundary values.
  • Negative testing: Verify Claude does NOT use the tool when it should not (e.g., general knowledge questions should not trigger a database lookup).
  • Multi-tool disambiguation: When multiple tools could apply, verify Claude selects the correct one.
  • Error path testing: Verify Claude handles error responses gracefully by retrying or taking alternative actions.
# Example test framework for tool definitions
def test_tool_selection():
    """Test that Claude selects the correct tool for various inputs."""
    test_cases = [
        {
            "input": "What is the status of order ORD-12345?",
            "expected_tool": "get_order_details",
            "expected_params": {"order_id": "ORD-12345"}
        },
        {
            "input": "Find all customers in California",
            "expected_tool": "search_customers",
            "expected_params": {"state": "CA"}
        },
        {
            "input": "What is the capital of France?",
            "expected_tool": None,  # Should not use any tool
        }
    ]

    for case in test_cases:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            tools=tools,
            messages=[{"role": "user", "content": case["input"]}]
        )
        # Validate tool selection
        assert_tool_selection(response, case["expected_tool"])
Key Takeaway: Tool design is a form of API design -- your tools are the API that Claude programs against. Invest in clear descriptions (what, when, returns), appropriate scoping (one tool per operation), structured error reporting with the is_error flag, and consistent parameter documentation. Use JSON Schema constraints (enums, ranges, patterns) to guide the model toward valid inputs. Always consider idempotency and side effects when designing write tools. Avoid cognitive overload by keeping the active tool count under 20 and ensuring tool purposes are distinct.