DAY 1 · PM

02 / 09

Tool Use & The Model Context Protocol

schedule8 minsignal_cellular_altBeginner1,708 words

From JSON Schema definitions to MCP servers — master the tool-use APIs from Anthropic and OpenAI, design tools that models actually use correctly, and learn the protocol turning every agent into an interoperable system.

What you will learn

01The Tool-Use Mental Model

02Anthropic Tool Use API

03OpenAI Function Calling

04Model Context Protocol (MCP) — The Tool-Use Standard

05Designing Good Tools — Heuristics That Work

06Error Handling — Where Most Agents Break

The moment an LLM can call a function, it stops being a chatbot and starts being a system. Tool use is the foundation of every agent in this course — without it, the model can only talk; with it, the model can act. This chapter covers the production-grade tool-use APIs from Anthropic and OpenAI, the Model Context Protocol that's standardizing the integration layer, and the practical patterns (parallel tools, strict schemas, error handling) that separate a demo from a deployment.

🔑

What you will internalize

1) Tools are JSON Schema definitions the model can choose to invoke. 2) The model emits tool_use blocks; your code executes them and returns tool_result. 3) Parallel tool calls are now default — design for concurrency, not sequence. 4) MCP is the universal tool-server protocol — write tools once, run them in any agent stack.

The Tool-Use Mental Model

A tool is a function the model can request — never invoke directly. The model doesn't run code; it produces structured JSON saying "please run get_weather with location="Tokyo"." Your code runs the function, gets the answer, and feeds the result back into the next model turn. That round-trip is the entire mechanism.

A single tool-use turn. The model never executes code itself — it always returns a request that your runtime fulfils. This separation is what makes tools sandboxable, auditable, and replaceable.

Anthropic Tool Use API

Anthropic's tool use API takes a list of tool definitions, each a JSON Schema. The model returns content blocks of type text, tool_use, or both.

Python · Anthropic tool use

import anthropic

client = anthropic.Anthropic()

tools = [{
    "name": "get_weather",
    "description": "Get current weather for a location.",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City, country"},
            "unit": {"type": "string", "enum": ["c", "f"]},
        },
        "required": ["location"],
    },
}]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
)

if response.stop_reason == "tool_use":
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            # feed result back as the next user turn

Tool choice modes

The tool_choice parameter controls how aggressively the model uses tools:

Mode	Behavior	System-prompt overhead	Use when
`auto` (default)	Model decides whether to call a tool	~346 tokens	Most cases — let the model judge
`any`	Model must call some tool	~313 tokens	You know a tool is needed but not which one
`tool`	Model must call a specific named tool	~313 tokens	Forced extraction; structured output via tool
`none`	No tools available this turn	~346 tokens	Final summarization step after tool results

💰

Token economics of tools

The tool-use system prompt overhead is fixed at 346 tokens on Claude 4.x with auto/none, and 313 with any/tool (per Anthropic's tool-use docs). On top of that, every tool definition adds its own JSON schema cost. Cache the system prompt + tool definitions with cache_control — they don't change between turns and you'll pay the cached-token price (10% of input) instead of full price.

Parallel tool calls

Claude 4.x emits multiple tool_use blocks in a single turn when the model judges them independent. Your runtime must execute them concurrently and return all tool_results before the next model turn:

Python · executing parallel tool calls

import asyncio

async def run_tools(tool_uses):
    # Fire all tool calls concurrently
    coros = [execute_tool(t.name, t.input) for t in tool_uses]
    results = await asyncio.gather(*coros, return_exceptions=True)
    return [
        {"type": "tool_result",
         "tool_use_id": t.id,
         "content": str(r),
         "is_error": isinstance(r, Exception)}
        for t, r in zip(tool_uses, results)
    ]

OpenAI Function Calling

OpenAI's function calling API mirrors Anthropic's shape. Two key features to know:

strict: true — guarantees the model's output conforms to your schema. Eliminates the "LLM returned invalid JSON" failure mode entirely. Introduced August 2024.
parallel_tool_calls: bool — same parallel-tool behavior as Claude. Default true; set false if your tools have side effects that must serialize.
Responses API (2025) — replaces Chat Completions for new agent code. Built-in tools (web_search, file_search, computer_use) require no setup.

Python · OpenAI Responses API with strict tool

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    input="Find recent news about MCP adoption.",
    tools=[
        {"type": "web_search"},   # built-in
        {"type": "function",
         "function": {
             "name": "save_summary",
             "strict": True,        # schema-conformant
             "parameters": {"type": "object", "properties": {
                 "title": {"type": "string"},
                 "summary": {"type": "string"},
             }, "required": ["title", "summary"]}}],
)

Model Context Protocol (MCP) — The Tool-Use Standard

The biggest 2024–2026 development in tool use isn't a new model — it's MCP, an open protocol Anthropic released in November 2024 that's now the universal way to expose tools to LLMs. By mid-2025 every major IDE (VS Code Copilot, Cursor, Zed, JetBrains), every major agent framework, and ChatGPT itself supports MCP servers. The practical effect: you write a tool once and any agent in any stack can use it.

Without MCP

Reimplement each tool for each framework (LangChain, OpenAI, Anthropic SDK, Claude Code…)
Vendor-locked tool definitions
Auth, rate-limiting, schemas duplicated everywhere
Tool changes require updating N clients

With MCP

Write a tool server once (stdio or HTTP+SSE)
Any MCP-compatible client connects it
Auth + caching + observability handled at protocol layer
Tool changes ship to one server, all clients benefit

Python · minimal MCP server

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("weather-server")

@mcp.tool()
def get_weather(location: str, unit: str = "c") -> dict:
    """Get current weather for a location."""
    # your real implementation here
    return {"location": location, "temp": 22, "unit": unit}

if __name__ == "__main__":
    mcp.run()  # stdio transport — Claude Desktop / Code reads this

Once registered (in ~/.claude.json or any client's MCP config), the tool is automatically available to every agent in that client. Schema, types, and docs are inferred from the Python signature.

Designing Good Tools — Heuristics That Work

The model's tool-selection accuracy depends almost entirely on how you describe and shape your tools. From Anthropic's official guidance and accumulated production experience:

Descriptions are documentation for the model, not for you. Write what the tool does, when to use it, when not to use it, and what the output looks like. Models pay attention to all four.
Names should be verbs or noun-verbs. get_weather beats weather. send_email beats email.
Few broad tools beat many narrow ones. A single search(query, filters) outperforms search_users, search_orders, search_products as separate tools — fewer for the model to choose between.
Constrain inputs aggressively. enum for known values, pattern for regex-validated strings, minimum/maximum for numbers. Strict schemas keep the model from inventing.
Return structured output. JSON, never prose. The next model turn parses it; humans read the trace.
Surface errors as content, not exceptions. Return {"error": "rate_limited", "retry_after": 30} with is_error: true. The model can react to it; an exception kills the loop.

Quick check

You give an agent two tools: read_database(query) and send_slack_message(channel, body). Both succeed on the happy path. The agent occasionally sends Slack messages with database row IDs in them. What's the most likely root cause and what's the fix?

Show answer

The model is mistaking the database response shape for content suitable for human messages — almost always because the tool's description says nothing about what the response looks like, or because read_database returns raw rows the model treats as the answer. Fix: reshape read_database to return a summary object ({rows: [...], summary: "3 results", display_format: "table"}), and add to send_slack_message's description: "Body must be human-readable text, not raw data structures." The model picks up the cue.

Error Handling — Where Most Agents Break

The single most common production agent failure isn't bad reasoning — it's a tool that errored, returned something unexpected, and the model couldn't recover. Three rules:

Always return — never raise. A raised exception bubbles up and the loop dies. Catch the exception, format it as a tool result with is_error: true, and let the model decide whether to retry, fall back, or give up.
Make errors actionable. {"error": "city_not_found", "suggestion": "Try a more specific name like 'Tokyo, Japan'"} teaches the model how to recover. {"error": "500"} teaches it nothing.
Set per-tool timeouts. A 30-second tool can deadlock a 2-second agent. Wrap every tool call in asyncio.wait_for or equivalent.

⚠️

Tool-call thrashing

A common failure mode: the model calls a tool, gets an error, calls the same tool with the same arguments, gets the same error, ad infinitum. Anthropic's multi-agent post explicitly calls this out. Mitigation: include the recent tool-call history in your context-management strategy, and add a tool description hint like "If this tool fails twice in a row, try a different approach or escalate." Surprisingly effective.

🔑

Key takeaways

1) Tool use is JSON in, JSON out — the model never executes code. 2) Cache tool definitions and system prompts with cache_control — token overhead is meaningful at scale. 3) Parallel tool use is default; design for concurrent execution. 4) MCP is the standard tool-server protocol — write tools as MCP servers and they work in every agent stack. 5) Errors as structured content beat exceptions every time.

📚 Further reading

Anthropic — Tool use overviewplatform.claude.com
OpenAI — Function calling guideplatform.openai.com
OpenAI — Introducing Structured Outputs in the APIopenai.com
Model Context Protocol — Specificationmodelcontextprotocol.io
MCP — Python SDK (FastMCP)github.com

Finished reading?