
Agent Foundations — Workflows, Agents & The Canonical Loop
Before tools and frameworks, the mental model. Learn what makes a system 'agentic', when not to build one, and the four-line loop that powers every agent in production.
What you will learn
An agent is an LLM that decides — for itself — which actions to take next. The model doesn't follow a fixed script; it reads a goal, picks a tool, observes the result, and decides what to do after. That single shift in control flow is the entire difference between calling an LLM and shipping an agent. Everything else in this course — tool use, memory, multi-agent, evals — is the engineering you need to make that loop reliable.
Workflows vs. Agents — The Anthropic Distinction
The cleanest taxonomy comes from Anthropic's Building Effective Agents (Dec 2024): a workflow is a system where LLMs and tools are orchestrated by predefined code paths. An agent is a system where the LLM dynamically directs its own tool use and control flow. Workflows are predictable and cheap; agents are flexible and expensive.
Anthropic's strongly stated guidance — and the consensus across production teams — is to start with a single LLM call, escalate to a workflow, and only reach for an agent when the task is genuinely open-ended. A workflow that does 80% of jobs reliably will outperform an agent that does 100% of jobs unreliably, every time.
The Canonical Agent Loop
Strip away the frameworks and an agent is four lines of pseudocode:
while not done: response = llm.call(messages, tools=available_tools) if response.tool_use: result = execute_tool(response.tool_use) messages.append(response, result) else: done = True # final answer
Everything sophisticated — ReAct, Reflexion, multi-agent, computer use — is a refinement of this loop. You're either changing what tools are available, changing how the model decides between them, or changing how state survives across iterations.
The Six Patterns from "Building Effective Agents"
Anthropic's essay distills production agent design into six reusable patterns. The first five are workflows — predefined orchestrations of LLM calls. Only the sixth is a true agent. Most teams ship four of the workflows and never need the agent.
| Pattern | Type | What it is | When to reach for it |
|---|---|---|---|
| Prompt chaining | Workflow | Decompose into ordered steps; each step's output feeds the next | Tasks that cleanly split into sub-tasks (outline → draft → polish) |
| Routing | Workflow | Classify the input, then dispatch to a specialized prompt/model | Heterogeneous queries (refund vs. tech-support vs. sales) |
| Parallelization | Workflow | Run sub-tasks concurrently; aggregate (sectioning) or vote (voting) | Independent sub-questions; multiple checks for safety |
| Orchestrator-workers | Workflow | Lead LLM dynamically delegates to worker LLMs | Tasks where sub-tasks aren't known in advance (research) |
| Evaluator-optimizer | Workflow | One LLM produces; another critiques; loop until threshold | Iterative refinement with measurable quality (translation) |
| Autonomous agent | Agent | LLM plans + executes in a loop, no fixed structure | Open-ended tasks; environment is unknown until acted upon |
What Makes Agents Hard
The agent loop looks trivial. It is not. Three forces conspire against you:
1. Compounding error
If each step has 95% reliability, ten steps run at 0.95^10 ≈ 60%. Twenty steps and you're at 36%. Real agents take 5-50 actions per run. A 5% per-step error rate that's unnoticeable in chat becomes a >50% failure rate in a 15-step agent.
2. Cost runaway
Anthropic publicly disclosed in How we built our multi-agent research system (June 2025) that their multi-agent setup uses ~15× more tokens than chat; single-agent ~4×. A naive agent without prompt caching, tool budgets, or loop caps can burn through hundreds of dollars on a single user request.
3. Debuggability
When the loop fails, the failure is non-deterministic — the model picked a wrong tool, then talked itself into a wrong fix, then fell into a reflection loop. You can't printf-debug a stochastic policy. This is why tracing every span (we'll cover this in the Observability chapter) is non-negotiable for production agents.
The Engineer's Checklist Before You Build
- Can a workflow do this? If yes, build the workflow. Save the agent for when the task is provably non-decomposable.
- What's the blast radius? An agent that
SELECTs data is different from one thatDELETEs. Scope tools to the minimum permissions needed; sandbox aggressive ones. - What's the budget? Set hard caps: max iterations (5-20), max tool calls per turn (3-5), max wall-clock (30-300s), max tokens (50k-500k). The model will not enforce these; your code must.
- What's the escape hatch? Every production agent needs a "give up and ask the human" branch. Know exactly when it triggers.
- How do you measure success? Define eval criteria before building. "It seems to work" is not a metric.
What You Will Build in This Course
By Day 7 you will have implemented:
- A tool-using agent against the Anthropic and OpenAI APIs (Day 1 PM)
- ReAct, Reflexion, and Plan-and-Execute loops, with the trade-offs measured (Day 2)
- Three memory architectures: vector RAG, structured memory (Letta-style), temporal graph (Zep-style) (Day 3)
- An orchestrator-workers research system (Day 4)
- A production-hardened agent: guardrails, retries, prompt caching, cost caps (Day 5)
- Full-stack observability: tracing, evals, regression suites (Day 6)
- A computer-use agent and a hybrid Stagehand-style browser agent (Day 7 AM)
- A side-by-side comparison of Claude Agent SDK, OpenAI Agents SDK, and LangGraph for the same task (Day 7 PM)
- Anthropic — Building Effective Agents (Schluntz & Zhang, Dec 2024)anthropic.com
- Lilian Weng — LLM Powered Autonomous Agentslilianweng.github.io
- ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022)arxiv.org
- Anthropic — How we built our multi-agent research systemanthropic.com
- Chip Huyen — Agentshuyenchip.com
Finished reading?