The Engineering Codex/Backend Engineering for the AI Era
DAY 3
05 / 09

Concurrency, Async Work & Message Queues

schedule12 minsignal_cellular_altAdvanced2,666 words
Most production failures look async. Master the delivery semantics of brokers (at-most-once, at-least-once, exactly-once), the job-queue patterns, the outbox and saga that survive partial failures, and the async architecture LLMs forced into the stack — long-running inference jobs, webhooks, and durable workflows.

What you will learn

01Three Shapes of Asynchronous Work
02Delivery Semantics — Pick Two of Three
03The Outbox Pattern — Atomic Database Write + Reliable Publish
04Sagas — Multi-Step Workflows that May Fail Anywhere
05Durable Workflows — Temporal, Inngest, Restate
06Backpressure — When the Producer Outpaces the Consumer

Anything that can take more than a few hundred milliseconds, anything that can be retried safely, anything that crosses a service boundary — push it out of the request lifecycle and into a queue. The result is a system where requests stay fast, failures stay local, and the slow parts (like a 12-second LLM call) become someone else's problem to retry. The cost is that async systems have more failure modes, not fewer; this chapter is about making the patterns explicit so you can pick the right one.

🔑
Today's async toolbox
1) Job queues vs message brokers vs event streams — three shapes, three mental models. 2) Delivery semantics: at-most-once, at-least-once, exactly-once — only one is real. 3) The outbox pattern — atomic database write + reliable publish. 4) Sagas for distributed workflows. 5) Durable workflows (Temporal, Inngest) — the pattern LLM agents adopted.

Three Shapes of Asynchronous Work

Tools in this space sound similar; they aren't. Drawing the lines saves arguments.

Job queue RQ, Sidekiq, Celery, BullMQ push job, get done Redis-backed delayed / retried FIFO-ish use for send email resize image call slow API batch summarize Message broker RabbitMQ, SQS, NATS, Pub/Sub N producers, M consumers topics / exchanges / fanout dead-letter queues routing keys use for service-to-service command bus work distribution webhooks fan-out Event stream / log Kafka, Kinesis, Redpanda durable, replayable consumers track offset ordered per partition retention in days use for event sourcing analytics ingest CDC pipelines audit trails
Three async shapes. Most teams use a job queue for background work, a broker for cross-service messages, and a stream for the analytics fire-hose.

Job queues

The simplest shape. Producer pushes a job (a small JSON payload describing what to do), worker pops one and runs it. A message lives until acknowledged, with retries, delays, and a dead-letter queue for poison messages. Backed by Redis (Sidekiq, RQ, BullMQ) or a database (Celery on Postgres). Best for in-process "this same app, but later" work — sending emails, generating thumbnails, calling slow third-parties.

Message brokers

Add routing semantics: topics, exchanges, fanout. A single produced message can be delivered to many consumer groups, each tracking its own progress. RabbitMQ excels at flexible routing; SQS and Google Pub/Sub are managed-cloud equivalents; NATS is the lightweight option. The right tool when an event in one service needs to trigger work in three others.

Event streams (logs)

A durable, ordered, replayable log. Producers append; consumers read by offset. Unlike queues, messages are not consumed — many consumers can independently read the same stream, and a new consumer can replay from the beginning. Kafka is the canonical implementation; Kinesis (AWS), Redpanda (drop-in faster Kafka) are alternatives. Best for analytics ingest, change-data-capture (CDC), audit trails, and any workflow where "replay history" matters.

💡
Don't reach for Kafka first
Kafka is the most-misused tool in this space. The operational footprint (brokers, ZooKeeper or KRaft, schema registry, partition rebalances, lag monitoring) is significant, and most workloads don't need replay or stream-processing semantics. Start with a job queue and a managed broker (SQS / Pub/Sub). Move to Kafka when you genuinely need event sourcing, multi-consumer replay, or sustained million-events-per-second throughput. The right answer for the first 95% of teams is not Kafka.

Delivery Semantics — Pick Two of Three

Every async system advertises some delivery guarantee. The honest truth: only one of the three is achievable for normal workloads.

SemanticMeansReality
At-most-onceMessage delivered 0 or 1 times.Easy: send, forget. Loses messages on failure.
At-least-onceMessage delivered ≥ 1 time.The default. Retries on failure can deliver duplicates.
Exactly-onceMessage processed exactly 1 time.End-to-end exactly-once is essentially mythical at the message level. Achievable only by making consumers idempotent.

The architecture pattern that gets you the property of "effectively exactly-once" is: at-least-once delivery + idempotent consumer. The broker may deliver the same message twice; your consumer recognizes the duplicate (by key, transaction, or output state) and processes it once. This is the design every real production system uses.

Idempotency in workers — three patterns

  • By message-id store. Every message has a unique ID. The worker checks a (message_id → status) table; if seen, skip. The simplest. Pair with a TTL so the table doesn't grow forever.
  • By natural key. The work is keyed by something the message carries (an order_id, a charge_id). The worker uses an UPSERT or a SELECT FOR UPDATE on that key, treating duplicates as no-ops. Strong because it survives even if the message_id store is lost.
  • By transactional outcome. The work is atomic with the dedupe — e.g., an INSERT with a unique constraint. The duplicate fails fast.
⚠️
"Exactly-once" claims, decoded
When a vendor advertises "exactly-once," they almost always mean within the broker's own boundaries (e.g., Kafka's exactly-once writes within a single broker transaction). The moment your consumer's side-effect crosses out of the broker — calls a payment API, writes to a different DB — it's at-least-once again. Plan for it.

The Outbox Pattern — Atomic Database Write + Reliable Publish

A subtle distributed-systems trap: you want to update the database and emit an event together. If you do them in two steps, three failure modes appear.

App handlerbegin transaction Postgres (single tx)UPDATE orders …INSERT INTO outbox …COMMIT Outbox publisherpoll OR debeziumforwards to brokermarks row sent Broker (Kafka / SQS)durable, ordered, replayable Downstream consumersidempotent Database write and event emission share one transaction. Publisher reads the table.
The outbox pattern. The DB write and the queued event go into the same transaction; a publisher process drains the outbox table.

The three failure modes without an outbox

  1. UPDATE succeeds, broker publish fails → DB and downstream diverge silently.
  2. UPDATE succeeds, broker publish times out → app retries, gets a successful publish, then re-runs the UPDATE on a different message — duplicate.
  3. UPDATE fails, broker publish accidentally succeeded — downstream acts on data that doesn't exist.

The outbox shape

  • Add an outbox table in the same database. Schema: id, aggregate_id, event_type, payload, created_at, sent_at.
  • In the application transaction, UPDATE the business state and INSERT a row in outbox.
  • A separate process polls (or uses CDC via Debezium) for unsent rows, publishes to the broker, marks sent_at.
  • Failures retry the publish; the consumer is idempotent.

The transaction guarantees the DB and the outbox row commit together or not at all. The publish is at-least-once; the consumer dedupes.

sql — the outbox table
CREATE TABLE outbox (
  id            BIGSERIAL PRIMARY KEY,
  aggregate_id  TEXT NOT NULL,                  -- e.g. order_id
  event_type    TEXT NOT NULL,                  -- 'order.created'
  payload       JSONB NOT NULL,
  created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
  sent_at       TIMESTAMPTZ
);

CREATE INDEX outbox_unsent ON outbox (created_at) WHERE sent_at IS NULL;

Sagas — Multi-Step Workflows that May Fail Anywhere

A traditional database transaction wraps several writes: charge the card, mark the order paid, decrement stock. If one fails, ROLLBACK undoes them all. This is impossible across services — you can't ROLLBACK a Stripe charge from a query plan. The solution is the saga: a sequence of local transactions, each with a defined compensating action that undoes its effect.

1. Reserve stock 2. Charge card 3. Ship 4. Notify 3. failscarrier outage ↩ refund card ↩ release stock Forward steps go top; failure runs the compensations in reverse.
A saga as choreography. Each forward step has a corresponding undo. Failure rolls back through the chain.

Two flavours

  • Choreography — each service publishes events; other services react. Decentralized, no single brain. Hard to debug at scale; loops are easy to introduce accidentally.
  • Orchestration — a single orchestrator service (or workflow engine) tells each step what to do and listens for results. Easier to reason about; the orchestrator becomes a critical path.

Modern teams overwhelmingly use orchestration via durable workflow engines (next section).

Durable Workflows — Temporal, Inngest, Restate

Writing sagas by hand — including "if the worker dies between step 2 and step 3, where do we resume?" — is exhausting and error-prone. Durable workflow engines solve this. The promise: write code that looks synchronous, the engine persists every step, crashes resume where they left off, and time passes via the engine, not real time.

typescript — Temporal-style workflow
// Looks synchronous. Persists state across crashes. Compensations explicit.
async function placeOrderWorkflow(orderId: string) {
  const stockHold = await activities.reserveStock(orderId);
  try {
    const charge = await activities.chargeCard(orderId);
    try {
      await activities.shipOrder(orderId);
      await activities.notifyUser(orderId);
      return { ok: true };
    } catch (err) {
      await activities.refund(charge.id);
      throw err;
    }
  } catch (err) {
    await activities.releaseStock(stockHold.id);
    throw err;
  }
}

What's happening under the hood: the engine records every activities.* call and its return value. If the worker dies in the middle of shipOrder, a new worker picks up the workflow and replays the recorded history to restore in-memory state, then continues from where the original left off. Time is also virtualized — await sleep(24h) doesn't block a worker; the engine wakes the workflow when needed.

This pattern was popularized by Cadence (Uber) → Temporal; Inngest and Restate are newer takes. They've become the default for any multi-step workflow — payment flows, onboarding, and crucially, LLM agent loops, which often involve dozens of steps spanning minutes or hours.

🌱
When to reach for a durable workflow
If your workflow has 3+ steps, runs across services, or has time-based delays ("send a follow-up email after 24h"), put it in a workflow engine from day one. Hand-rolled cron + database-state-machine is the path that works for the first 6 months and then becomes the team's least-favourite codebase.

Backpressure — When the Producer Outpaces the Consumer

A queue without backpressure is a silent failure waiting to happen. The producer happily fills the queue; consumers can't keep up; the queue grows unbounded; broker memory or disk eventually overflows. Three remedies:

  • Bounded queues. The queue has a max size. Producer either blocks, drops, or returns 429 to its caller. The classic token bucket rate limiter implements this.
  • Reactive streams (RxJava, Project Reactor, Server-Sent Events with credit). The consumer signals how many items it can handle next; the producer respects the credit.
  • Per-tenant fairness queues. One noisy tenant cannot starve others. Round-robin or weighted-fair across tenant queues.

For LLM workloads, backpressure is the difference between graceful degradation and a thundering herd. When the upstream LLM provider rate-limits, your consumer must back off — exponential backoff with jitter (Day 5) — or you'll hammer the rate limit harder.

Async Patterns for the AI Era

The async-LLM-call pattern

The recipe for any LLM call that takes more than a few seconds, or any batch operation:

  1. Client POSTs /jobs with the prompt, gets back { job_id, status: "queued" }.
  2. The handler writes the job to the DB and pushes to the queue (outbox-pattern style).
  3. A worker pops the job, calls the LLM, streams or stores the result.
  4. Client polls GET /jobs/{id} or receives a webhook when status flips to done.
  5. For UX, the client may also subscribe to a per-job SSE stream that emits tokens as they arrive.

Most modern LLM platforms (Anthropic batch, OpenAI batch) provide this pattern as a managed offering for non-interactive workloads, with steep cost discounts (~50%) in exchange for the relaxed latency.

Long-running agent workflows

Agents that call multiple tools sequentially can run for tens of minutes. They're a perfect fit for durable workflows: each tool call is an activity, the workflow's state survives crashes, and the agent can sleep awaiting human approval ("reply to this email, but a human reviews first") without holding any resources.

Streaming through a queue

What about streaming tokens to the user when the actual generation is happening on a background worker? The pattern: the worker writes deltas to a Redis stream (or a dedicated low-latency channel) keyed by job_id; the API gateway holds the user's SSE connection open and tails that stream. The user gets sub-second tokens; the worker is decoupled from the connection lifecycle.

Dead-Letter Queues — Where Bad Messages Go

A poison message — malformed JSON, a permanently-failing dependency, code that throws an unrecoverable exception — will retry forever if you let it. The dead-letter queue is the safety valve: after N retries, the message moves to a separate queue for human inspection. Two rules:

  • Always have one. Without a DLQ, retries pile up and starve healthy work.
  • Alert on it. A growing DLQ is a system health signal — watch the count and the age of the oldest message.

Observability for Async Systems

The four metrics you must track per queue:

  • Queue depth (lag) — current backlog. Trend matters more than instantaneous value.
  • Oldest message age — your tail latency for async work. A 10-minute-old message in a "send email" queue is an incident.
  • Throughput in vs out — produce rate vs consume rate. When in > out for sustained periods, you're falling behind.
  • Failure rate & DLQ count — both as absolutes and as fraction of total processed.
Quick check
Your service writes to Postgres and publishes a Kafka event when an order is created. Sometimes downstream services act on events for orders that don't exist in the DB. Sometimes a successful order has no event. Diagnose and propose a fix.
Show answer
Diagnosis: the database write and the Kafka publish are not atomic. If the Kafka publish succeeds but the DB write fails (connection blip after publish but before commit), downstream sees a phantom event. If the DB write succeeds but the Kafka publish fails, the order exists silently. Fix: the outbox pattern. Write the event row to an outbox table in the same transaction as the order. A separate publisher reads unsent rows and forwards them to Kafka, marking them sent. The two are now atomic; downstream sees every committed order, exactly the orders that actually committed. Pair with idempotent consumers because the publisher's at-least-once will occasionally double-publish.
Mnemonic — async checklist
"Queue, Idempotent, Outbox, DLQ, Lag."
  • Queue — the right shape (job/broker/stream).
  • Idempotent — every consumer dedupes by some key.
  • Outbox — DB write and emit are atomic.
  • DLQ — poison messages go somewhere.
  • Lag — monitor depth and oldest-message age.
Flashcard
A team builds an AI agent that does: (1) retrieve, (2) plan, (3) call tool A, (4) reflect, (5) call tool B, (6) summarize, (7) wait for human approval, (8) send. Total elapsed time: minutes to hours. What's the right architectural primitive for this and why?
Click to flip ↻
Answer
A durable workflow engine (Temporal, Inngest, Restate). The reasons: (1) the workflow spans multiple services and external API calls, each of which can fail independently — handcrafting compensations across this is brittle. (2) Step 7 is a long-duration wait — possibly hours — that no synchronous handler should hold open. The engine virtualizes time, so the workflow sleeps without burning resources. (3) Crash safety — if the worker hosting the agent dies mid-step, a new worker resumes from the last completed activity. (4) Observability — the engine ships state-machine views, retries, and timings out of the box. Hand-rolled cron + state machine in the database is the obvious-looking alternative that becomes the team's least-favourite system within six months.
🔑
Key takeaways
1) Job queue, message broker, event stream are different shapes — pick by the questions you'll ask of it. 2) Exactly-once is a property of the consumer, not the broker. Build idempotent consumers and assume at-least-once. 3) The outbox pattern is the canonical fix for "DB write + emit event" atomicity. 4) Sagas with compensations are how you do transactions across services. 5) Durable workflow engines turn multi-step async workflows from a sea of bugs into ordinary code — and are the right home for LLM agent loops. 6) Backpressure, DLQs, and lag monitoring are the operational primitives that keep async systems honest.

Finished reading?