🔧Tool Design & MCPLesson 2.5

MCP Security & Production Deployment

Authentication, rate limiting, and production strategies for MCP.

20 min

Learning Objectives

Implement authentication for MCP servers
Design rate limiting and access control
Deploy MCP servers in production environments

MCP Security and Production Deployment

Deploying MCP servers in production introduces security, reliability, and operational concerns that go far beyond basic functionality. This lesson covers the security model of MCP, authentication patterns, rate limiting, and deployment best practices that are essential for building enterprise-grade agentic systems.

The MCP Security Model

MCP's security model is built on several core principles that you must understand for both the exam and real-world deployments:

Principle of least privilege: Servers should only request the minimum permissions needed. Clients should only grant the minimum capabilities required.
User consent and control: Users should be aware of and consent to what data is being shared and what actions tools can perform. Hosts are responsible for surfacing this information.
Defense in depth: Multiple layers of security (transport security, authentication, authorization, input validation) should be applied rather than relying on any single mechanism.
Data minimization: Only expose the data necessary for the tool to function. Avoid passing entire database records when only specific fields are needed.

Exam Tip: The security principles (least privilege, user consent, defense in depth, data minimization) appear frequently on the exam. You may be given a scenario and asked which security principle is being violated. For example, a server that returns all user fields when only name and email are needed violates data minimization.

Authentication Patterns

OAuth 2.1 for Remote Servers

For remote MCP servers accessed over HTTP, OAuth 2.1 is the standard authentication mechanism defined in the MCP specification. OAuth 2.1 improves upon OAuth 2.0 by requiring PKCE (Proof Key for Code Exchange) and disallowing implicit grant flows. The flow works as follows:

Step 1: Client discovers the server's authorization endpoint
Step 2: Client redirects user to authorize with PKCE challenge
Step 3: User grants permissions
Step 4: Client receives an authorization code
Step 5: Client exchanges code for access token (with PKCE verifier)
Step 6: Client includes token in subsequent requests

# Server-side: Validating OAuth tokens in FastMCP
from fastmcp import FastMCP
import json

mcp = FastMCP("SecureServer")


async def validate_token(token: str) -> dict:
    """Validate an OAuth access token and return user info."""
    # Verify token with your OAuth provider
    response = await oauth_client.introspect(token)
    if not response.get("active"):
        raise PermissionError("Invalid or expired token")
    return {
        "user_id": response["sub"],
        "scopes": response["scope"].split(),
    }


@mcp.tool()
async def get_sensitive_data(ctx, record_id: str) -> str:
    """Retrieve sensitive data. Requires 'read:sensitive' scope.

    Args:
        record_id: The record identifier to look up
    """
    # Access the authenticated user context
    user = ctx.request_context.user
    if "read:sensitive" not in user["scopes"]:
        raise PermissionError("Insufficient permissions: requires read:sensitive scope")

    data = db.get_record(record_id, user_id=user["user_id"])
    return json.dumps(data)

API Key Authentication

For simpler deployments or internal services, API key authentication can be used with environment variables. This is common for stdio-based local servers.

import os
from fastmcp import FastMCP

mcp = FastMCP("InternalServer")

# API keys passed via environment variables in server configuration
# {
#   "mcpServers": {
#     "internal": {
#       "command": "python",
#       "args": ["server.py"],
#       "env": {
#         "API_KEY": "sk-internal-...",
#         "DATABASE_URL": "postgresql://..."
#       }
#     }
#   }
# }

# Server reads credentials from environment
API_KEY = os.environ.get("API_KEY")
DATABASE_URL = os.environ.get("DATABASE_URL")

if not API_KEY:
    raise ValueError("API_KEY environment variable is required")

Exam Tip: The exam tests knowledge of MCP authentication patterns. Remember that OAuth 2.1 (with PKCE) is the standard for remote HTTP servers. For local stdio servers, authentication is typically handled via environment variables since the server runs as a trusted subprocess. Never hardcode credentials in server code. The exam may also test knowledge of token scopes for access control.

Input Validation and Sanitization

Every tool input must be validated before processing. Never trust that the model (or a malicious client) will send valid inputs. This is critical for preventing injection attacks, especially when tool inputs are used in database queries or shell commands.

Common Injection Attack Vectors

SQL injection: Tool input used directly in SQL queries without parameterization. Always use parameterized queries.
Command injection: Tool input passed to shell commands. Never useos.system() or subprocess.run(shell=True) with user input.
Path traversal: Tool input used in file paths without validation. Always resolve paths and verify they stay within allowed directories.
Prompt injection via tool results: Malicious data returned from external systems that could influence the model's behavior.

import re
import os
from fastmcp import FastMCP

mcp = FastMCP("SafeServer")


def sanitize_identifier(value: str) -> str:
    """Sanitize a string to be safe for use as an identifier."""
    if not re.match(r"^[a-zA-Z][a-zA-Z0-9_]{0,63}$", value):
        raise ValueError(f"Invalid identifier: {value}")
    return value


@mcp.tool()
def query_table(table_name: str, limit: int = 10) -> str:
    """Query records from a database table.

    Args:
        table_name: Name of the table to query (alphanumeric and underscores only)
        limit: Maximum number of records to return (1-100)
    """
    # Validate table name to prevent SQL injection
    safe_table = sanitize_identifier(table_name)

    # Validate limit range
    if not 1 <= limit <= 100:
        raise ValueError("Limit must be between 1 and 100")

    # Use parameterized queries
    results = db.execute(
        f"SELECT * FROM {safe_table} LIMIT %s",  # table name validated above
        (limit,)  # limit as parameter
    )
    return json.dumps(results)


@mcp.tool()
def search_files(pattern: str, directory: str = "/data") -> str:
    """Search for files matching a pattern.

    Args:
        pattern: Glob pattern to match files (e.g., *.txt, report_*.csv)
        directory: Base directory to search in
    """
    # Prevent path traversal attacks
    import glob
    base_dir = os.path.realpath("/data")
    search_dir = os.path.realpath(directory)

    if not search_dir.startswith(base_dir):
        raise ValueError("Directory must be within /data")

    # Prevent command injection via glob pattern
    safe_pattern = re.sub(r"[^a-zA-Z0-9_.*?\[\]-]", "", pattern)

    results = glob.glob(os.path.join(search_dir, safe_pattern))
    return json.dumps(results)

Rate Limiting and Resource Protection

In production, MCP servers must protect themselves from excessive use -- whether from a runaway agent loop, a misconfigured client, or a malicious actor.

import time
from collections import defaultdict
from fastmcp import FastMCP

# Simple in-memory rate limiter
class RateLimiter:
    def __init__(self, max_calls: int, window_seconds: int):
        self.max_calls = max_calls
        self.window = window_seconds
        self.calls = defaultdict(list)

    def check(self, key: str) -> bool:
        """Return True if the call is allowed, False if rate-limited."""
        now = time.time()
        # Remove expired entries
        self.calls[key] = [t for t in self.calls[key] if now - t < self.window]
        if len(self.calls[key]) >= self.max_calls:
            return False
        self.calls[key].append(now)
        return True


mcp = FastMCP("RateLimitedServer")
rate_limiter = RateLimiter(max_calls=60, window_seconds=60)  # 60 calls/minute


@mcp.tool()
def search_database(query: str) -> str:
    """Search the database with rate limiting.

    Args:
        query: Search query string
    """
    if not rate_limiter.check("search_database"):
        raise RuntimeError(
            "Rate limit exceeded: maximum 60 searches per minute. "
            "Please wait before retrying."
        )

    results = db.search(query)
    return json.dumps(results)

Logging and Audit Trails

Production MCP servers should log all tool invocations for debugging, monitoring, and compliance purposes. Every tool call should be traceable.

import logging
import json
import time
import uuid
from fastmcp import FastMCP, Context

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("mcp_audit")

mcp = FastMCP("AuditedServer")


def audit_log(tool_name: str, inputs: dict, result: str, duration_ms: float, error: str = None):
    """Write an audit log entry for a tool invocation."""
    entry = {
        "timestamp": time.time(),
        "request_id": str(uuid.uuid4()),
        "tool": tool_name,
        "inputs": inputs,
        "result_length": len(result) if result else 0,
        "duration_ms": duration_ms,
        "error": error,
    }
    logger.info(json.dumps(entry))


@mcp.tool()
def modify_record(record_id: str, field: str, value: str) -> str:
    """Update a field on a database record.

    Args:
        record_id: The record to modify
        field: The field name to update
        value: The new value for the field
    """
    start = time.time()
    try:
        result = db.update(record_id, {field: value})
        response = json.dumps({"status": "updated", "record_id": record_id})
        audit_log("modify_record", {"record_id": record_id, "field": field}, response, (time.time() - start) * 1000)
        return response
    except Exception as e:
        audit_log("modify_record", {"record_id": record_id, "field": field}, None, (time.time() - start) * 1000, str(e))
        raise

Exam Tip: The exam frequently tests security principles for MCP deployments. Key points to remember: (1) always validate and sanitize tool inputs, (2) use parameterized queries to prevent injection, (3) implement rate limiting to prevent abuse, (4) log all tool invocations for audit trails, and (5) follow the principle of least privilege when granting server access to resources. Be especially aware of path traversal and SQL injection as common attack vectors.

Production Deployment Patterns

Docker Deployment

# Dockerfile for an MCP server
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY server.py .

# Do not run as root
RUN useradd -m mcpuser
USER mcpuser

# Expose SSE port
EXPOSE 8080

# Health check
HEALTHCHECK --interval=30s --timeout=5s \
    CMD curl -f http://localhost:8080/health || exit 1

CMD ["python", "server.py"]

Environment-Based Configuration

import os
import json
from fastmcp import FastMCP

# All configuration via environment variables
config = {
    "db_url": os.environ["DATABASE_URL"],
    "api_key": os.environ["API_KEY"],
    "max_results": int(os.environ.get("MAX_RESULTS", "100")),
    "rate_limit": int(os.environ.get("RATE_LIMIT", "60")),
    "log_level": os.environ.get("LOG_LEVEL", "INFO"),
    "allowed_tables": os.environ.get("ALLOWED_TABLES", "users,orders,products").split(","),
}

mcp = FastMCP("ProductionServer")


@mcp.tool()
def query(table: str, limit: int = 10) -> str:
    """Query a database table.

    Args:
        table: Table name to query
        limit: Maximum results to return
    """
    if table not in config["allowed_tables"]:
        raise ValueError(f"Table '{table}' is not in the allowed list")

    limit = min(limit, config["max_results"])
    results = db.query(table, limit=limit)
    return json.dumps(results)


if __name__ == "__main__":
    transport = os.environ.get("MCP_TRANSPORT", "stdio")
    if transport == "sse":
        mcp.run(transport="sse", host="0.0.0.0", port=8080)
    else:
        mcp.run(transport="stdio")

Error Recovery and Resilience

Production MCP servers must handle failures gracefully. Implement timeouts, circuit breakers, and fallback mechanisms for external dependencies.

import asyncio
from fastmcp import FastMCP

mcp = FastMCP("ResilientServer")


async def call_with_timeout(coro, timeout_seconds=30):
    """Execute an async operation with a timeout."""
    try:
        return await asyncio.wait_for(coro, timeout=timeout_seconds)
    except asyncio.TimeoutError:
        raise RuntimeError(f"Operation timed out after {timeout_seconds} seconds")


async def call_with_retry(func, max_retries=3, backoff_base=1.0):
    """Retry a function with exponential backoff."""
    last_error = None
    for attempt in range(max_retries):
        try:
            return await func()
        except Exception as e:
            last_error = e
            if attempt < max_retries - 1:
                wait_time = backoff_base * (2 ** attempt)
                await asyncio.sleep(wait_time)
    raise RuntimeError(f"Failed after {max_retries} retries: {last_error}")


@mcp.tool()
async def fetch_external_data(query: str) -> str:
    """Fetch data from an external API with timeout and retry.

    Args:
        query: The search query
    """
    async def _fetch():
        return await external_api.search(query)

    result = await call_with_timeout(
        call_with_retry(_fetch, max_retries=3),
        timeout_seconds=30
    )
    return json.dumps(result)

Monitoring and Observability

In production, you need visibility into your MCP server's health and performance. Key metrics to track include:

Request latency: How long each tool call takes to execute
Error rate: Percentage of tool calls that fail
Request volume: Number of tool calls per minute/hour
Resource utilization: CPU, memory, and connection pool usage
Active connections: Number of connected MCP clients

import time
from dataclasses import dataclass, field
from collections import defaultdict


@dataclass
class ServerMetrics:
    """Track MCP server metrics for monitoring."""
    tool_call_count: dict = field(default_factory=lambda: defaultdict(int))
    tool_error_count: dict = field(default_factory=lambda: defaultdict(int))
    tool_latency_sum: dict = field(default_factory=lambda: defaultdict(float))

    def record_call(self, tool_name: str, duration_ms: float, error: bool = False):
        self.tool_call_count[tool_name] += 1
        self.tool_latency_sum[tool_name] += duration_ms
        if error:
            self.tool_error_count[tool_name] += 1

    def get_avg_latency(self, tool_name: str) -> float:
        count = self.tool_call_count[tool_name]
        if count == 0:
            return 0.0
        return self.tool_latency_sum[tool_name] / count

    def get_error_rate(self, tool_name: str) -> float:
        count = self.tool_call_count[tool_name]
        if count == 0:
            return 0.0
        return self.tool_error_count[tool_name] / count


metrics = ServerMetrics()

Key Takeaway: Production MCP deployment requires multiple layers of security and reliability. Always validate inputs, implement rate limiting, log all tool invocations, and handle errors gracefully with timeouts and retries. Use OAuth 2.1 for remote servers and environment variables for credentials. Follow the principle of least privilege throughout -- from server permissions to data exposure. Container deployments should run as non-root users with health checks. Monitor latency, error rates, and request volume for operational visibility.