
Agent Frameworks — Choosing Your Stack
Claude Agent SDK, OpenAI Agents SDK, LangGraph, Pydantic AI, Mastra, CrewAI — the six production frameworks that matter, side by side, with a decision rule for picking yours.
What you will learn
You've covered foundations, tool use, loops, memory, multi-agent, hardening, observability, and computer use. The last decision is the one that determines how much of this you build vs. inherit: which agent framework do you actually use? The 2026 landscape has consolidated to about six serious choices. This chapter is a side-by-side comparison and a decision rule to help you pick.
The Six Production Frameworks
1. Claude Agent SDK
Claude Agent SDK (formerly Claude Code SDK) ships the entire Claude Code agent loop as a Python and TypeScript library: built-in tools (Read, Edit, Bash, Glob, Grep, WebSearch), hooks (PreToolUse, PostToolUse, SessionStart), subagents, MCP support, JSONL session resume, and an explicit budget model. Pairs with Managed Agents for hosted-sandbox deployment.
- Strengths: production-tested loop (it's literally Claude Code's runtime); great for code-running agents; minimal glue.
- Weaknesses: Anthropic-only; less flexible for multi-vendor tool routing.
- Pick if: you're committed to Claude and want shippable agents with the least scaffolding.
2. OpenAI Agents SDK
OpenAI Agents SDK replaced the deprecated Swarm in March 2025. Three primitives: agents, handoffs, guardrails. Built-in tracing in OpenAI's dashboard. Strong support for voice agents (via gpt-realtime) and the Responses API's built-in tools.
- Strengths: tight OpenAI integration; voice; clean handoff abstraction.
- Weaknesses: OpenAI-only; less mature checkpoint/durability story than LangGraph.
- Pick if: you're on OpenAI, especially for voice or realtime agents.
3. LangGraph
LangGraph models agents as explicit state graphs with first-class checkpointing, human-in-loop pauses, and durable execution. Provider-agnostic. The de-facto standard for long-running, branching, or human-approval-required workflows.
- Strengths: durable state; HITL; multi-vendor; pairs with LangSmith for tracing.
- Weaknesses: heavier abstraction; more LOC for simple agents.
- Pick if: your agent runs >5 minutes, needs checkpointing, or requires human approval steps.
4. Pydantic AI
Pydantic AI brings the type-safety discipline of Pydantic to agent code. Dependency injection, strict structured outputs, model-agnostic, idiomatic Python. Lightest footprint of the serious frameworks.
- Strengths: strict types; tiny abstraction; great for Python apps that already use Pydantic.
- Weaknesses: newer; smaller ecosystem than LangChain.
- Pick if: you want minimal magic, full IDE support, and your stack is Python + FastAPI.
5. Mastra
Mastra is the TypeScript-native answer: agents, workflows, evals, and memory in one TS-first SDK. Plays well with Next.js, Vercel, and modern JS deployment.
- Strengths: TS ergonomics; integrated workflows + evals; good Next.js integration.
- Weaknesses: JS/TS only; smaller community than LangGraph.
- Pick if: your team writes TypeScript and you want a cohesive agent stack without leaving the JS ecosystem.
6. CrewAI
CrewAI takes an opinionated "agents-as-roles + tasks + crews" approach. Less code-heavy than LangGraph; popular for business-process automation.
- Strengths: approachable mental model; fast to prototype with non-engineers in the loop.
- Weaknesses: less control; the role/task/crew abstraction can feel like a constraint at scale.
- Pick if: you're automating business workflows and want a framework that reads like an org chart.
The Side-By-Side Comparison
| Framework | Language | Vendor | State / checkpointing | Best for |
|---|---|---|---|---|
| Claude Agent SDK | Python · TS | Anthropic | Built-in (sessions, JSONL) | Code-running agents on Claude |
| OpenAI Agents SDK | Python · TS | OpenAI | Tracing-only | OpenAI stack, voice agents |
| LangGraph | Python · JS | Any | First-class checkpoints | Long-running, HITL workflows |
| Pydantic AI | Python | Any | Manual (lightweight) | Type-safe Python apps |
| Mastra | TypeScript | Any | Built-in workflows | JS/TS apps, Next.js |
| CrewAI | Python | Any | Lightweight | Business-process automation |
The Decision Rule
- Is your stack TypeScript? → Mastra (or roll your own; the API surfaces are simple in TS).
- Are you locked into one provider? Anthropic → Claude Agent SDK. OpenAI → OpenAI Agents SDK.
- Does your agent run >5 minutes or need human approvals? → LangGraph. The checkpointing/HITL story is unmatched.
- Do you want minimal magic and full IDE support? → Pydantic AI.
- Are non-engineers configuring the agents? → CrewAI's role/task/crew model reads more like business logic.
- Otherwise: roll your own with the canonical loop from chapter 1. The frameworks save you 100 lines; sometimes 100 lines is the right answer.
The Same Task in Three Frameworks
To make the differences concrete, here's a "research a topic and return a structured report" agent in three frameworks. Same goal, different ergonomics.
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions async with ClaudeSDKClient(options=ClaudeAgentOptions( system_prompt="You research topics and return structured JSON reports.", allowed_tools=["WebSearch", "WebFetch"], max_turns=15, )) as client: await client.query("Research MCP adoption in 2026; return JSON.") async for msg in client.receive_response(): print(msg)
from agents import Agent, Runner from agents.tools import WebSearchTool agent = Agent( name="Researcher", instructions="Research topics; return structured JSON reports.", tools=[WebSearchTool()], output_type=ReportSchema, ) result = await Runner.run(agent, "Research MCP adoption in 2026.")
from langgraph.prebuilt import create_react_agent from langgraph.checkpoint.sqlite import SqliteSaver agent = create_react_agent( model="anthropic:claude-sonnet-4-6", tools=[web_search_tool], response_format=ReportSchema, checkpointer=SqliteSaver.from_conn_string(":memory:"), ) result = await agent.ainvoke( {"messages": ["Research MCP adoption in 2026."]}, config={"configurable": {"thread_id": "abc"}}, )
Notice the convergence: same shape, same primitives, slightly different ergonomics. Once you've internalized the canonical loop from Day 1, every framework reads as a thin layer over it.
Where The Field Is Heading
Looking at the 2025-2026 trajectory:
- MCP wins as the tool-server layer. Every framework now treats MCP as default. Tool definitions outlive the framework that called them.
- Durable execution becomes table-stakes. LangGraph started it; Mastra and Claude Agent SDK have followed. By 2027, an agent framework without checkpointing will look anachronistic.
- Computer use moves from beta to production-default. The OSWorld trajectory says capability will catch human baseline within 12-18 months for narrow tasks.
- Eval-driven development becomes the norm. Braintrust, Langfuse Evals, Inspect AI — the toolchain is converging on "every prompt change is a PR with eval results attached."
- Agent SDKs absorb middleware. Memory tools, context editing, prompt caching — features that lived in third-party libraries in 2024 are now native in the SDKs.
The 7-Day Recap
- Day 1 AM — Agents = LLM in a loop with tools and a budget. Workflows beat agents for most tasks.
- Day 1 PM — Tool use is JSON in, JSON out. MCP is the universal standard.
- Day 2 — ReAct is default; reach for Reflexion / ToT / Plan-and-Execute only when their costs are justified.
- Day 3 — Three memory layers: cached context, vector RAG, structured stores. Combine all three.
- Day 4 — Multi-agent costs ~15× more tokens. Use orchestrator-workers when sub-tasks are genuinely parallel.
- Day 5 — Production = guardrails, retries, caching, sandboxing, escape hatches. Nothing is optional.
- Day 6 — Trace every span. Score every run. Catch regressions before users do.
- Day 7 AM — Computer use is general but expensive; hybrid (Stagehand) wins for stable-DOM tasks.
- Day 7 PM — Pick the framework whose defaults match your stack. Roll your own when 100 lines beats a dependency.
- Claude Agent SDK — Overviewcode.claude.com
- OpenAI Agents SDK — Pythonopenai.github.io
- LangGraph — Stateful Agent Workflowsdocs.langchain.com
- Pydantic AI — Agent Frameworkai.pydantic.dev
- Mastra — TypeScript Agent Frameworkmastra.ai
- CrewAI — Roles, Tasks, Crewsdocs.crewai.com
Finished reading?