Agent Foundations — Workflows, Agents & The Canonical Loop

schedule5 minsignal_cellular_altBeginner1,205 words

Before tools and frameworks, the mental model. Learn what makes a system 'agentic', when not to build one, and the four-line loop that powers every agent in production.

What you will learn

01Workflows vs. Agents — The Anthropic Distinction

02The Canonical Agent Loop

03The Six Patterns from "Building Effective Agents"

04What Makes Agents Hard

05The Engineer's Checklist Before You Build

06What You Will Build in This Course

An agent is an LLM that decides — for itself — which actions to take next. The model doesn't follow a fixed script; it reads a goal, picks a tool, observes the result, and decides what to do after. That single shift in control flow is the entire difference between calling an LLM and shipping an agent. Everything else in this course — tool use, memory, multi-agent, evals — is the engineering you need to make that loop reliable.

🔑

The two questions to answer first

1) Does this task actually need agency? Most production "AI features" are workflows: a fixed sequence of LLM calls and tool invocations. 2) If yes — what bounds the loop? Step count, wall-clock, token budget, blast radius. An agent without bounds is a runaway script with a credit card.

Workflows vs. Agents — The Anthropic Distinction

The cleanest taxonomy comes from Anthropic's Building Effective Agents (Dec 2024): a workflow is a system where LLMs and tools are orchestrated by predefined code paths. An agent is a system where the LLM dynamically directs its own tool use and control flow. Workflows are predictable and cheap; agents are flexible and expensive.

Workflows are graphs you draw. Agents are graphs the LLM draws — at runtime, on every iteration, sometimes recursively.

Anthropic's strongly stated guidance — and the consensus across production teams — is to start with a single LLM call, escalate to a workflow, and only reach for an agent when the task is genuinely open-ended. A workflow that does 80% of jobs reliably will outperform an agent that does 100% of jobs unreliably, every time.

The Canonical Agent Loop

Strip away the frameworks and an agent is four lines of pseudocode:

Python · the canonical loop

while not done:
    response = llm.call(messages, tools=available_tools)
    if response.tool_use:
        result = execute_tool(response.tool_use)
        messages.append(response, result)
    else:
        done = True   # final answer

Everything sophisticated — ReAct, Reflexion, multi-agent, computer use — is a refinement of this loop. You're either changing what tools are available, changing how the model decides between them, or changing how state survives across iterations.

🧠

Lilian Weng's mental model

In her canonical 2023 essay, Lilian Weng decomposes agents into four pillars: LLM (the brain), planning (task decomposition + reflection), memory (short-term context, long-term retrieval), and tool use (function calling). Every chapter of this course maps to one of those pillars.

The Six Patterns from "Building Effective Agents"

Anthropic's essay distills production agent design into six reusable patterns. The first five are workflows — predefined orchestrations of LLM calls. Only the sixth is a true agent. Most teams ship four of the workflows and never need the agent.

Pattern	Type	What it is	When to reach for it
Prompt chaining	Workflow	Decompose into ordered steps; each step's output feeds the next	Tasks that cleanly split into sub-tasks (outline → draft → polish)
Routing	Workflow	Classify the input, then dispatch to a specialized prompt/model	Heterogeneous queries (refund vs. tech-support vs. sales)
Parallelization	Workflow	Run sub-tasks concurrently; aggregate (sectioning) or vote (voting)	Independent sub-questions; multiple checks for safety
Orchestrator-workers	Workflow	Lead LLM dynamically delegates to worker LLMs	Tasks where sub-tasks aren't known in advance (research)
Evaluator-optimizer	Workflow	One LLM produces; another critiques; loop until threshold	Iterative refinement with measurable quality (translation)
Autonomous agent	Agent	LLM plans + executes in a loop, no fixed structure	Open-ended tasks; environment is unknown until acted upon

What Makes Agents Hard

The agent loop looks trivial. It is not. Three forces conspire against you:

1. Compounding error

If each step has 95% reliability, ten steps run at 0.95^10 ≈ 60%. Twenty steps and you're at 36%. Real agents take 5-50 actions per run. A 5% per-step error rate that's unnoticeable in chat becomes a >50% failure rate in a 15-step agent.

2. Cost runaway

Anthropic publicly disclosed in How we built our multi-agent research system (June 2025) that their multi-agent setup uses ~15× more tokens than chat; single-agent ~4×. A naive agent without prompt caching, tool budgets, or loop caps can burn through hundreds of dollars on a single user request.

3. Debuggability

When the loop fails, the failure is non-deterministic — the model picked a wrong tool, then talked itself into a wrong fix, then fell into a reflection loop. You can't printf-debug a stochastic policy. This is why tracing every span (we'll cover this in the Observability chapter) is non-negotiable for production agents.

⚠️

The "demo trap"

Agent demos are misleading. A 70% success rate is impressive in a video and unshippable in production — refunds, support escalations, and churn happen on the 30%. Most real agent products are closer to 95-99% accurate on a narrow scope, achieved by aggressively constraining the agent's freedom (fewer tools, hard limits, escape hatches to humans).

The Engineer's Checklist Before You Build

Can a workflow do this? If yes, build the workflow. Save the agent for when the task is provably non-decomposable.
What's the blast radius? An agent that SELECTs data is different from one that DELETEs. Scope tools to the minimum permissions needed; sandbox aggressive ones.
What's the budget? Set hard caps: max iterations (5-20), max tool calls per turn (3-5), max wall-clock (30-300s), max tokens (50k-500k). The model will not enforce these; your code must.
What's the escape hatch? Every production agent needs a "give up and ask the human" branch. Know exactly when it triggers.
How do you measure success? Define eval criteria before building. "It seems to work" is not a metric.

What You Will Build in This Course

By Day 7 you will have implemented:

A tool-using agent against the Anthropic and OpenAI APIs (Day 1 PM)
ReAct, Reflexion, and Plan-and-Execute loops, with the trade-offs measured (Day 2)
Three memory architectures: vector RAG, structured memory (Letta-style), temporal graph (Zep-style) (Day 3)
An orchestrator-workers research system (Day 4)
A production-hardened agent: guardrails, retries, prompt caching, cost caps (Day 5)
Full-stack observability: tracing, evals, regression suites (Day 6)
A computer-use agent and a hybrid Stagehand-style browser agent (Day 7 AM)
A side-by-side comparison of Claude Agent SDK, OpenAI Agents SDK, and LangGraph for the same task (Day 7 PM)

🔑

Key takeaways

1) An agent is just an LLM in a loop with tools and a budget — start simple. 2) Workflows beat agents for most production tasks; reach for an agent only when the path can't be predetermined. 3) Compounding error and cost runaway are the two killers of agent reliability — bound the loop or the loop will bound your wallet. 4) Anthropic's six patterns cover ~95% of real-world systems; only one of them is a "true" agent.

📚 Further reading

Anthropic — Building Effective Agents (Schluntz & Zhang, Dec 2024)anthropic.com
Lilian Weng — LLM Powered Autonomous Agentslilianweng.github.io
ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022)arxiv.org
Anthropic — How we built our multi-agent research systemanthropic.com
Chip Huyen — Agentshuyenchip.com

Finished reading?