The Engineering Codex/Agentic AI with LLM APIs
DAY 1 · PM
02 / 09

Tool Use & The Model Context Protocol

schedule8 minsignal_cellular_altBeginner1,708 words
From JSON Schema definitions to MCP servers — master the tool-use APIs from Anthropic and OpenAI, design tools that models actually use correctly, and learn the protocol turning every agent into an interoperable system.

What you will learn

01The Tool-Use Mental Model
02Anthropic Tool Use API
03OpenAI Function Calling
04Model Context Protocol (MCP) — The Tool-Use Standard
05Designing Good Tools — Heuristics That Work
06Error Handling — Where Most Agents Break

The moment an LLM can call a function, it stops being a chatbot and starts being a system. Tool use is the foundation of every agent in this course — without it, the model can only talk; with it, the model can act. This chapter covers the production-grade tool-use APIs from Anthropic and OpenAI, the Model Context Protocol that's standardizing the integration layer, and the practical patterns (parallel tools, strict schemas, error handling) that separate a demo from a deployment.

🔑
What you will internalize
1) Tools are JSON Schema definitions the model can choose to invoke. 2) The model emits tool_use blocks; your code executes them and returns tool_result. 3) Parallel tool calls are now default — design for concurrency, not sequence. 4) MCP is the universal tool-server protocol — write tools once, run them in any agent stack.

The Tool-Use Mental Model

A tool is a function the model can request — never invoke directly. The model doesn't run code; it produces structured JSON saying "please run get_weather with location="Tokyo"." Your code runs the function, gets the answer, and feeds the result back into the next model turn. That round-trip is the entire mechanism.

ONE TOOL-USE TURN Your code orchestrator + tool runtime LLM picks tool + arguments tool_use { name, input } Tool your function DB / API / shell ① user msg + schemas ② tool_use block ③ tool_result ④ final text or next tool_use Loop until the model returns a normal text response with no tool_use blocks.
A single tool-use turn. The model never executes code itself — it always returns a request that your runtime fulfils. This separation is what makes tools sandboxable, auditable, and replaceable.

Anthropic Tool Use API

Anthropic's tool use API takes a list of tool definitions, each a JSON Schema. The model returns content blocks of type text, tool_use, or both.

Python · Anthropic tool use
import anthropic

client = anthropic.Anthropic()

tools = [{
    "name": "get_weather",
    "description": "Get current weather for a location.",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City, country"},
            "unit": {"type": "string", "enum": ["c", "f"]},
        },
        "required": ["location"],
    },
}]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
)

if response.stop_reason == "tool_use":
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            # feed result back as the next user turn

Tool choice modes

The tool_choice parameter controls how aggressively the model uses tools:

ModeBehaviorSystem-prompt overheadUse when
auto (default)Model decides whether to call a tool~346 tokensMost cases — let the model judge
anyModel must call some tool~313 tokensYou know a tool is needed but not which one
toolModel must call a specific named tool~313 tokensForced extraction; structured output via tool
noneNo tools available this turn~346 tokensFinal summarization step after tool results
💰
Token economics of tools
The tool-use system prompt overhead is fixed at 346 tokens on Claude 4.x with auto/none, and 313 with any/tool (per Anthropic's tool-use docs). On top of that, every tool definition adds its own JSON schema cost. Cache the system prompt + tool definitions with cache_control — they don't change between turns and you'll pay the cached-token price (10% of input) instead of full price.

Parallel tool calls

Claude 4.x emits multiple tool_use blocks in a single turn when the model judges them independent. Your runtime must execute them concurrently and return all tool_results before the next model turn:

Python · executing parallel tool calls
import asyncio

async def run_tools(tool_uses):
    # Fire all tool calls concurrently
    coros = [execute_tool(t.name, t.input) for t in tool_uses]
    results = await asyncio.gather(*coros, return_exceptions=True)
    return [
        {"type": "tool_result",
         "tool_use_id": t.id,
         "content": str(r),
         "is_error": isinstance(r, Exception)}
        for t, r in zip(tool_uses, results)
    ]

OpenAI Function Calling

OpenAI's function calling API mirrors Anthropic's shape. Two key features to know:

  • strict: true — guarantees the model's output conforms to your schema. Eliminates the "LLM returned invalid JSON" failure mode entirely. Introduced August 2024.
  • parallel_tool_calls: bool — same parallel-tool behavior as Claude. Default true; set false if your tools have side effects that must serialize.
  • Responses API (2025) — replaces Chat Completions for new agent code. Built-in tools (web_search, file_search, computer_use) require no setup.
Python · OpenAI Responses API with strict tool
from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    input="Find recent news about MCP adoption.",
    tools=[
        {"type": "web_search"},   # built-in
        {"type": "function",
         "function": {
             "name": "save_summary",
             "strict": True,        # schema-conformant
             "parameters": {"type": "object", "properties": {
                 "title": {"type": "string"},
                 "summary": {"type": "string"},
             }, "required": ["title", "summary"]}}],
)

Model Context Protocol (MCP) — The Tool-Use Standard

The biggest 2024–2026 development in tool use isn't a new model — it's MCP, an open protocol Anthropic released in November 2024 that's now the universal way to expose tools to LLMs. By mid-2025 every major IDE (VS Code Copilot, Cursor, Zed, JetBrains), every major agent framework, and ChatGPT itself supports MCP servers. The practical effect: you write a tool once and any agent in any stack can use it.

Without MCP
  • Reimplement each tool for each framework (LangChain, OpenAI, Anthropic SDK, Claude Code…)
  • Vendor-locked tool definitions
  • Auth, rate-limiting, schemas duplicated everywhere
  • Tool changes require updating N clients
With MCP
  • Write a tool server once (stdio or HTTP+SSE)
  • Any MCP-compatible client connects it
  • Auth + caching + observability handled at protocol layer
  • Tool changes ship to one server, all clients benefit
Python · minimal MCP server
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("weather-server")

@mcp.tool()
def get_weather(location: str, unit: str = "c") -> dict:
    """Get current weather for a location."""
    # your real implementation here
    return {"location": location, "temp": 22, "unit": unit}

if __name__ == "__main__":
    mcp.run()  # stdio transport — Claude Desktop / Code reads this

Once registered (in ~/.claude.json or any client's MCP config), the tool is automatically available to every agent in that client. Schema, types, and docs are inferred from the Python signature.

Designing Good Tools — Heuristics That Work

The model's tool-selection accuracy depends almost entirely on how you describe and shape your tools. From Anthropic's official guidance and accumulated production experience:

  1. Descriptions are documentation for the model, not for you. Write what the tool does, when to use it, when not to use it, and what the output looks like. Models pay attention to all four.
  2. Names should be verbs or noun-verbs. get_weather beats weather. send_email beats email.
  3. Few broad tools beat many narrow ones. A single search(query, filters) outperforms search_users, search_orders, search_products as separate tools — fewer for the model to choose between.
  4. Constrain inputs aggressively. enum for known values, pattern for regex-validated strings, minimum/maximum for numbers. Strict schemas keep the model from inventing.
  5. Return structured output. JSON, never prose. The next model turn parses it; humans read the trace.
  6. Surface errors as content, not exceptions. Return {"error": "rate_limited", "retry_after": 30} with is_error: true. The model can react to it; an exception kills the loop.
Quick check
You give an agent two tools: read_database(query) and send_slack_message(channel, body). Both succeed on the happy path. The agent occasionally sends Slack messages with database row IDs in them. What's the most likely root cause and what's the fix?
Show answer
The model is mistaking the database response shape for content suitable for human messages — almost always because the tool's description says nothing about what the response looks like, or because read_database returns raw rows the model treats as the answer. Fix: reshape read_database to return a summary object ({rows: [...], summary: "3 results", display_format: "table"}), and add to send_slack_message's description: "Body must be human-readable text, not raw data structures." The model picks up the cue.

Error Handling — Where Most Agents Break

The single most common production agent failure isn't bad reasoning — it's a tool that errored, returned something unexpected, and the model couldn't recover. Three rules:

  • Always return — never raise. A raised exception bubbles up and the loop dies. Catch the exception, format it as a tool result with is_error: true, and let the model decide whether to retry, fall back, or give up.
  • Make errors actionable. {"error": "city_not_found", "suggestion": "Try a more specific name like 'Tokyo, Japan'"} teaches the model how to recover. {"error": "500"} teaches it nothing.
  • Set per-tool timeouts. A 30-second tool can deadlock a 2-second agent. Wrap every tool call in asyncio.wait_for or equivalent.
⚠️
Tool-call thrashing
A common failure mode: the model calls a tool, gets an error, calls the same tool with the same arguments, gets the same error, ad infinitum. Anthropic's multi-agent post explicitly calls this out. Mitigation: include the recent tool-call history in your context-management strategy, and add a tool description hint like "If this tool fails twice in a row, try a different approach or escalate." Surprisingly effective.
🔑
Key takeaways
1) Tool use is JSON in, JSON out — the model never executes code. 2) Cache tool definitions and system prompts with cache_control — token overhead is meaningful at scale. 3) Parallel tool use is default; design for concurrent execution. 4) MCP is the standard tool-server protocol — write tools as MCP servers and they work in every agent stack. 5) Errors as structured content beat exceptions every time.

Finished reading?