Tool Boundary Management
Managing tool complexity and preventing model overload.
Learning Objectives
- Identify when too many tools degrade performance
- Strategies for organizing and scoping tool sets
- Dynamic tool selection patterns
Tool Boundary Management
As agentic systems grow in complexity, tool management becomes a critical architectural challenge. When an agent has access to too many tools, performance degrades -- the model spends more tokens reasoning about which tool to use, makes more selection errors, and response latency increases. This lesson covers strategies for organizing, selecting, and managing tools at scale.
The “Too Many Tools” Problem
Research and practical experience show that Claude's tool selection accuracy degrades as the number of available tools increases. This is not merely a matter of context window size -- it is a fundamental challenge of decision-making complexity.
- Token overhead: Each tool definition consumes tokens in the context window. With 50 tools, tool definitions alone can consume 10,000+ tokens.
- Selection confusion: With many similar tools, the model may select the wrong one, especially when tool descriptions overlap or are ambiguous.
- Increased latency: More tools means more reasoning about which tool to use, increasing time-to-first-token.
- Error amplification: In multi-step agentic loops, tool selection errors compound across iterations. A wrong tool choice early in the loop can send the agent down a completely wrong path.
- Decreased output quality: The model's attention is split across many tool definitions, which can reduce the quality of its reasoning about the actual task.
Strategy 1: Tool Categorization and Namespacing
Organize tools into logical categories and use consistent naming conventions. This helps both the model and human developers understand the tool landscape.
# Organize tools by domain using prefixes
tool_categories = {
"user_management": [
{"name": "user_lookup", "description": "Look up a user by ID or email..."},
{"name": "user_create", "description": "Create a new user account..."},
{"name": "user_update", "description": "Update user profile fields..."},
{"name": "user_deactivate", "description": "Deactivate a user account..."},
],
"order_management": [
{"name": "order_search", "description": "Search orders by criteria..."},
{"name": "order_details", "description": "Get full details for an order..."},
{"name": "order_update_status", "description": "Update order status..."},
{"name": "order_refund", "description": "Process a refund for an order..."},
],
"analytics": [
{"name": "analytics_revenue", "description": "Get revenue metrics..."},
{"name": "analytics_usage", "description": "Get usage statistics..."},
{"name": "analytics_funnel", "description": "Get conversion funnel data..."},
]
}Benefits of categorization:
- Consistent naming prefixes help the model distinguish between domains
- Categories map naturally to MCP server boundaries
- Easier to select relevant subsets based on context
- Simplifies monitoring and metrics per domain
Strategy 2: Dynamic Tool Selection
Instead of providing all tools in every request, dynamically select which tools to include based on the conversation context. This is one of the most effective strategies for managing tool boundaries.
Keyword-Based Tool Filtering
def select_tools_by_context(user_message: str, all_tools: dict) -> list:
"""Select relevant tools based on keywords in the user message."""
keyword_to_category = {
"user": "user_management",
"account": "user_management",
"profile": "user_management",
"order": "order_management",
"purchase": "order_management",
"refund": "order_management",
"revenue": "analytics",
"metrics": "analytics",
"analytics": "analytics",
"report": "analytics",
}
selected_categories = set()
message_lower = user_message.lower()
for keyword, category in keyword_to_category.items():
if keyword in message_lower:
selected_categories.add(category)
# If no specific category matched, include a general-purpose subset
if not selected_categories:
selected_categories = {"user_management"} # Default category
selected_tools = []
for category in selected_categories:
selected_tools.extend(all_tools.get(category, []))
return selected_tools
# Usage in the agentic loop
user_message = "I need to check on the refund status for order #12345"
tools = select_tools_by_context(user_message, tool_categories)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=tools, # Only order-related tools included
messages=[{"role": "user", "content": user_message}]
)Embedding-Based Tool Selection
For more sophisticated selection, use embeddings to match the user's query against tool descriptions and select the most semantically relevant tools. This handles cases where keywords alone are insufficient.
import numpy as np
class ToolSelector:
"""Select relevant tools using semantic similarity."""
def __init__(self, all_tools: list, embedding_model):
self.all_tools = all_tools
self.model = embedding_model
# Pre-compute embeddings for all tool descriptions
self.tool_embeddings = []
for tool in all_tools:
text = f"{tool['name']}: {tool['description']}"
embedding = self.model.embed(text)
self.tool_embeddings.append(embedding)
def select(self, query: str, top_k: int = 5) -> list:
"""Select the top-k most relevant tools for a query."""
query_embedding = self.model.embed(query)
# Compute cosine similarity
similarities = []
for i, tool_emb in enumerate(self.tool_embeddings):
sim = np.dot(query_embedding, tool_emb) / (
np.linalg.norm(query_embedding) * np.linalg.norm(tool_emb)
)
similarities.append((sim, i))
# Return top-k tools
similarities.sort(reverse=True)
selected = [self.all_tools[idx] for _, idx in similarities[:top_k]]
return selected
# Usage
selector = ToolSelector(all_tools, embedding_model)
relevant_tools = selector.select("How much revenue did we make last quarter?", top_k=5)Strategy 3: Two-Stage Tool Selection (Tool Routing)
Use a lightweight first pass to select tool categories, then provide only the tools from the selected categories to the main model. This is sometimes called “tool routing” and is one of the most powerful patterns for large tool sets.
import json
def two_stage_tool_selection(user_message: str, tool_categories: dict) -> list:
"""Use Claude to select relevant tool categories, then provide those tools."""
# Stage 1: Use a fast model to classify the request
category_descriptions = {
name: f"{name}: {tools[0]['description'][:100]}..."
for name, tools in tool_categories.items()
}
classification = client.messages.create(
model="claude-haiku-4-20250514", # Fast, cheap model for classification
max_tokens=100,
messages=[{
"role": "user",
"content": f"Classify this request into one or more categories."
f"Categories: {json.dumps(category_descriptions)}"
f"Request: {user_message}"
f"Return only the category names as a JSON array."
}]
)
categories = json.loads(classification.content[0].text)
# Stage 2: Collect tools from selected categories
selected_tools = []
for cat in categories:
if cat in tool_categories:
selected_tools.extend(tool_categories[cat])
return selected_toolsTwo-stage selection is effective because:
- The classification model (e.g., Haiku) is fast and cheap
- It can handle ambiguous queries better than keyword matching
- The main model only sees a focused, relevant tool set
- It scales well as the total number of tools grows
Strategy 4: Tool Consolidation
Sometimes the best approach is to reduce the total number of tools by consolidating related operations into fewer, more flexible tools. This must be balanced against the scoping principles from Lesson 2.2 -- consolidate where it reduces confusion, not where it creates “god tools.”
# Before: 4 separate tools that could confuse the model
# - search_orders_by_date
# - search_orders_by_customer
# - search_orders_by_status
# - search_orders_by_amount
# After: 1 consolidated tool with optional filters
{
"name": "search_orders",
"description": "Search orders with optional filters. Provide one or more filter criteria to narrow results. Returns matching orders with ID, date, customer, status, and total amount.",
"input_schema": {
"type": "object",
"properties": {
"customer_id": {
"type": "string",
"description": "Filter by customer ID"
},
"status": {
"type": "string",
"enum": ["pending", "processing", "shipped", "delivered", "cancelled"],
"description": "Filter by order status"
},
"date_from": {
"type": "string",
"description": "Filter orders after this date (ISO 8601)"
},
"date_to": {
"type": "string",
"description": "Filter orders before this date (ISO 8601)"
},
"min_amount": {
"type": "number",
"description": "Minimum order amount"
},
"max_amount": {
"type": "number",
"description": "Maximum order amount"
},
"limit": {
"type": "integer",
"description": "Maximum results to return (default 20)",
"default": 20
}
}
}
}When to Consolidate vs. Keep Separate
- Consolidate when: Multiple tools differ only in their filter parameters (like the search example above), or when the tools share the same underlying operation.
- Keep separate when: Tools have different side effects (read vs. write), different security requirements, or fundamentally different purposes.
Strategy 5: MCP Server Composition
When using MCP, you can distribute tools across multiple servers and only connect to the servers relevant to the current session. This is a natural way to manage tool boundaries at the architectural level.
# Claude Desktop configuration with role-based server access
# Configuration for a customer support agent
{
"mcpServers": {
"customer-db": {
"command": "python",
"args": ["servers/customer_server.py"],
"env": {"DB_URL": "postgresql://localhost/customers"}
},
"order-system": {
"command": "python",
"args": ["servers/order_server.py"],
"env": {"DB_URL": "postgresql://localhost/orders"}
},
"knowledge-base": {
"command": "python",
"args": ["servers/kb_server.py"],
"env": {"KB_PATH": "/data/support-kb"}
}
}
}
# Configuration for a data analyst agent (different tool set)
{
"mcpServers": {
"analytics-db": {
"command": "python",
"args": ["servers/analytics_server.py"],
"env": {"DB_URL": "postgresql://localhost/analytics"}
},
"visualization": {
"command": "python",
"args": ["servers/viz_server.py"]
},
"data-warehouse": {
"command": "python",
"args": ["servers/warehouse_server.py"],
"env": {"WAREHOUSE_URL": "bigquery://project/dataset"}
}
}
}Measuring Tool Selection Quality
To evaluate whether your tool management strategy is working, track these metrics:
- Tool selection accuracy: What percentage of tool calls select the correct tool on the first attempt?
- Tool call failure rate: What percentage of tool calls result in errors (wrong parameters, wrong tool, etc.)?
- Average tools per request: How many tools are included in the average API call? Lower is generally better.
- Unnecessary tool calls: How often does the model call a tool when it did not need to?
- Token overhead: What percentage of context window tokens are consumed by tool definitions?
# Tracking tool selection metrics
class ToolMetrics:
def __init__(self):
self.total_calls = 0
self.correct_selections = 0
self.error_calls = 0
self.tool_tokens_used = 0
def record_call(self, tool_name: str, was_correct: bool, had_error: bool):
self.total_calls += 1
if was_correct:
self.correct_selections += 1
if had_error:
self.error_calls += 1
@property
def accuracy(self) -> float:
if self.total_calls == 0:
return 0.0
return self.correct_selections / self.total_calls
@property
def error_rate(self) -> float:
if self.total_calls == 0:
return 0.0
return self.error_calls / self.total_calls
def report(self) -> str:
return (
f"Tool Metrics Report:"
f" Total calls: {self.total_calls}"
f" Selection accuracy: {self.accuracy:.1%}"
f" Error rate: {self.error_rate:.1%}"
)Practical Guidelines
Based on Anthropic's recommendations and production experience, follow these guidelines for tool boundary management:
- Keep active tool count under 20: For any single API call, aim for fewer than 20 tools. Performance is best with 5-10 focused tools.
- Use descriptive, distinct tool names: If two tools could be confused by their names alone, reconsider the naming or consolidate them.
- Include negative guidance in descriptions: Tell Claude when NOT to use a tool (e.g., “Do not use this for general knowledge questions”).
- Test tool disambiguation: Create test cases where multiple tools could plausibly apply and verify correct selection.
- Monitor and iterate: Track tool selection metrics in production and refine tool definitions based on real error patterns.
- Use the system prompt for tool guidance: The system prompt can include instructions about when to prefer certain tools over others.
# Using the system prompt to guide tool selection
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="""You are a customer support agent with access to the following tool categories:
- Customer tools (user_lookup, user_update): Use for questions about customer accounts
- Order tools (order_search, order_details, order_refund): Use for order-related inquiries
- Knowledge base (kb_search): Use for policy questions and general support procedures
Always try the knowledge base first for policy questions before looking up specific records.
For refund requests, always look up the order details first before processing a refund.""",
tools=selected_tools,
messages=[{"role": "user", "content": user_message}]
)Summary of Tool Management Strategies
Here is a comparison of when to use each strategy:
- Categorization and namespacing: Always use this as a foundation. Organize tools logically regardless of other strategies.
- Keyword-based dynamic selection: Use when tool categories map cleanly to specific keywords in user messages. Fast and simple.
- Embedding-based selection: Use when user queries are diverse and don't map neatly to keywords. More robust but adds latency.
- Two-stage routing: Use when you have many categories and need AI-level reasoning to select the right ones. Best for 50+ tools.
- Consolidation: Use when you have many tools that differ only in filter parameters. Reduces count without losing capability.
- MCP server composition: Use when different roles or use cases need different tool sets. Natural architectural boundary.