The Engineering Codex/Agentic AI with LLM APIs
DAY 1 · AM
01 / 09

Agent Foundations — Workflows, Agents & The Canonical Loop

schedule5 minsignal_cellular_altBeginner1,205 words
Before tools and frameworks, the mental model. Learn what makes a system 'agentic', when not to build one, and the four-line loop that powers every agent in production.

What you will learn

01Workflows vs. Agents — The Anthropic Distinction
02The Canonical Agent Loop
03The Six Patterns from "Building Effective Agents"
04What Makes Agents Hard
05The Engineer's Checklist Before You Build
06What You Will Build in This Course

An agent is an LLM that decides — for itself — which actions to take next. The model doesn't follow a fixed script; it reads a goal, picks a tool, observes the result, and decides what to do after. That single shift in control flow is the entire difference between calling an LLM and shipping an agent. Everything else in this course — tool use, memory, multi-agent, evals — is the engineering you need to make that loop reliable.

🔑
The two questions to answer first
1) Does this task actually need agency? Most production "AI features" are workflows: a fixed sequence of LLM calls and tool invocations. 2) If yes — what bounds the loop? Step count, wall-clock, token budget, blast radius. An agent without bounds is a runaway script with a credit card.

Workflows vs. Agents — The Anthropic Distinction

The cleanest taxonomy comes from Anthropic's Building Effective Agents (Dec 2024): a workflow is a system where LLMs and tools are orchestrated by predefined code paths. An agent is a system where the LLM dynamically directs its own tool use and control flow. Workflows are predictable and cheap; agents are flexible and expensive.

WORKFLOW · code-driven AGENT · LLM-driven Step 1 classify Step 2 extract Step 3 generate Path is fixed. Predictable cost & latency. LLM decides next action tool A tool B tool C Path emerges at runtime. Cost & latency are unbounded unless you bound them.
Workflows are graphs you draw. Agents are graphs the LLM draws — at runtime, on every iteration, sometimes recursively.

Anthropic's strongly stated guidance — and the consensus across production teams — is to start with a single LLM call, escalate to a workflow, and only reach for an agent when the task is genuinely open-ended. A workflow that does 80% of jobs reliably will outperform an agent that does 100% of jobs unreliably, every time.

The Canonical Agent Loop

Strip away the frameworks and an agent is four lines of pseudocode:

Python · the canonical loop
while not done:
    response = llm.call(messages, tools=available_tools)
    if response.tool_use:
        result = execute_tool(response.tool_use)
        messages.append(response, result)
    else:
        done = True   # final answer

Everything sophisticated — ReAct, Reflexion, multi-agent, computer use — is a refinement of this loop. You're either changing what tools are available, changing how the model decides between them, or changing how state survives across iterations.

🧠
Lilian Weng's mental model
In her canonical 2023 essay, Lilian Weng decomposes agents into four pillars: LLM (the brain), planning (task decomposition + reflection), memory (short-term context, long-term retrieval), and tool use (function calling). Every chapter of this course maps to one of those pillars.

The Six Patterns from "Building Effective Agents"

Anthropic's essay distills production agent design into six reusable patterns. The first five are workflows — predefined orchestrations of LLM calls. Only the sixth is a true agent. Most teams ship four of the workflows and never need the agent.

PatternTypeWhat it isWhen to reach for it
Prompt chainingWorkflowDecompose into ordered steps; each step's output feeds the nextTasks that cleanly split into sub-tasks (outline → draft → polish)
RoutingWorkflowClassify the input, then dispatch to a specialized prompt/modelHeterogeneous queries (refund vs. tech-support vs. sales)
ParallelizationWorkflowRun sub-tasks concurrently; aggregate (sectioning) or vote (voting)Independent sub-questions; multiple checks for safety
Orchestrator-workersWorkflowLead LLM dynamically delegates to worker LLMsTasks where sub-tasks aren't known in advance (research)
Evaluator-optimizerWorkflowOne LLM produces; another critiques; loop until thresholdIterative refinement with measurable quality (translation)
Autonomous agentAgentLLM plans + executes in a loop, no fixed structureOpen-ended tasks; environment is unknown until acted upon

What Makes Agents Hard

The agent loop looks trivial. It is not. Three forces conspire against you:

1. Compounding error

If each step has 95% reliability, ten steps run at 0.95^10 ≈ 60%. Twenty steps and you're at 36%. Real agents take 5-50 actions per run. A 5% per-step error rate that's unnoticeable in chat becomes a >50% failure rate in a 15-step agent.

2. Cost runaway

Anthropic publicly disclosed in How we built our multi-agent research system (June 2025) that their multi-agent setup uses ~15× more tokens than chat; single-agent ~4×. A naive agent without prompt caching, tool budgets, or loop caps can burn through hundreds of dollars on a single user request.

3. Debuggability

When the loop fails, the failure is non-deterministic — the model picked a wrong tool, then talked itself into a wrong fix, then fell into a reflection loop. You can't printf-debug a stochastic policy. This is why tracing every span (we'll cover this in the Observability chapter) is non-negotiable for production agents.

⚠️
The "demo trap"
Agent demos are misleading. A 70% success rate is impressive in a video and unshippable in production — refunds, support escalations, and churn happen on the 30%. Most real agent products are closer to 95-99% accurate on a narrow scope, achieved by aggressively constraining the agent's freedom (fewer tools, hard limits, escape hatches to humans).

The Engineer's Checklist Before You Build

  1. Can a workflow do this? If yes, build the workflow. Save the agent for when the task is provably non-decomposable.
  2. What's the blast radius? An agent that SELECTs data is different from one that DELETEs. Scope tools to the minimum permissions needed; sandbox aggressive ones.
  3. What's the budget? Set hard caps: max iterations (5-20), max tool calls per turn (3-5), max wall-clock (30-300s), max tokens (50k-500k). The model will not enforce these; your code must.
  4. What's the escape hatch? Every production agent needs a "give up and ask the human" branch. Know exactly when it triggers.
  5. How do you measure success? Define eval criteria before building. "It seems to work" is not a metric.

What You Will Build in This Course

By Day 7 you will have implemented:

  • A tool-using agent against the Anthropic and OpenAI APIs (Day 1 PM)
  • ReAct, Reflexion, and Plan-and-Execute loops, with the trade-offs measured (Day 2)
  • Three memory architectures: vector RAG, structured memory (Letta-style), temporal graph (Zep-style) (Day 3)
  • An orchestrator-workers research system (Day 4)
  • A production-hardened agent: guardrails, retries, prompt caching, cost caps (Day 5)
  • Full-stack observability: tracing, evals, regression suites (Day 6)
  • A computer-use agent and a hybrid Stagehand-style browser agent (Day 7 AM)
  • A side-by-side comparison of Claude Agent SDK, OpenAI Agents SDK, and LangGraph for the same task (Day 7 PM)
🔑
Key takeaways
1) An agent is just an LLM in a loop with tools and a budget — start simple. 2) Workflows beat agents for most production tasks; reach for an agent only when the path can't be predetermined. 3) Compounding error and cost runaway are the two killers of agent reliability — bound the loop or the loop will bound your wallet. 4) Anthropic's six patterns cover ~95% of real-world systems; only one of them is a "true" agent.

Finished reading?