Concurrency, Async Work & Message Queues
Most production failures look async. Master the delivery semantics of brokers (at-most-once, at-least-once, exactly-once), the job-queue patterns, the outbox and saga that survive partial failures, and the async architecture LLMs forced into the stack — long-running inference jobs, webhooks, and durable workflows.
What you will learn
Anything that can take more than a few hundred milliseconds, anything that can be retried safely, anything that crosses a service boundary — push it out of the request lifecycle and into a queue. The result is a system where requests stay fast, failures stay local, and the slow parts (like a 12-second LLM call) become someone else's problem to retry. The cost is that async systems have more failure modes, not fewer; this chapter is about making the patterns explicit so you can pick the right one.
Three Shapes of Asynchronous Work
Tools in this space sound similar; they aren't. Drawing the lines saves arguments.
Job queues
The simplest shape. Producer pushes a job (a small JSON payload describing what to do), worker pops one and runs it. A message lives until acknowledged, with retries, delays, and a dead-letter queue for poison messages. Backed by Redis (Sidekiq, RQ, BullMQ) or a database (Celery on Postgres). Best for in-process "this same app, but later" work — sending emails, generating thumbnails, calling slow third-parties.
Message brokers
Add routing semantics: topics, exchanges, fanout. A single produced message can be delivered to many consumer groups, each tracking its own progress. RabbitMQ excels at flexible routing; SQS and Google Pub/Sub are managed-cloud equivalents; NATS is the lightweight option. The right tool when an event in one service needs to trigger work in three others.
Event streams (logs)
A durable, ordered, replayable log. Producers append; consumers read by offset. Unlike queues, messages are not consumed — many consumers can independently read the same stream, and a new consumer can replay from the beginning. Kafka is the canonical implementation; Kinesis (AWS), Redpanda (drop-in faster Kafka) are alternatives. Best for analytics ingest, change-data-capture (CDC), audit trails, and any workflow where "replay history" matters.
Delivery Semantics — Pick Two of Three
Every async system advertises some delivery guarantee. The honest truth: only one of the three is achievable for normal workloads.
| Semantic | Means | Reality |
|---|---|---|
| At-most-once | Message delivered 0 or 1 times. | Easy: send, forget. Loses messages on failure. |
| At-least-once | Message delivered ≥ 1 time. | The default. Retries on failure can deliver duplicates. |
| Exactly-once | Message processed exactly 1 time. | End-to-end exactly-once is essentially mythical at the message level. Achievable only by making consumers idempotent. |
The architecture pattern that gets you the property of "effectively exactly-once" is: at-least-once delivery + idempotent consumer. The broker may deliver the same message twice; your consumer recognizes the duplicate (by key, transaction, or output state) and processes it once. This is the design every real production system uses.
Idempotency in workers — three patterns
- By message-id store. Every message has a unique ID. The worker checks a (message_id → status) table; if seen, skip. The simplest. Pair with a TTL so the table doesn't grow forever.
- By natural key. The work is keyed by something the message carries (an order_id, a charge_id). The worker uses an UPSERT or a SELECT FOR UPDATE on that key, treating duplicates as no-ops. Strong because it survives even if the message_id store is lost.
- By transactional outcome. The work is atomic with the dedupe — e.g., an INSERT with a unique constraint. The duplicate fails fast.
The Outbox Pattern — Atomic Database Write + Reliable Publish
A subtle distributed-systems trap: you want to update the database and emit an event together. If you do them in two steps, three failure modes appear.
The three failure modes without an outbox
- UPDATE succeeds, broker publish fails → DB and downstream diverge silently.
- UPDATE succeeds, broker publish times out → app retries, gets a successful publish, then re-runs the UPDATE on a different message — duplicate.
- UPDATE fails, broker publish accidentally succeeded — downstream acts on data that doesn't exist.
The outbox shape
- Add an
outboxtable in the same database. Schema: id, aggregate_id, event_type, payload, created_at, sent_at. - In the application transaction,
UPDATEthe business state andINSERTa row inoutbox. - A separate process polls (or uses CDC via Debezium) for unsent rows, publishes to the broker, marks
sent_at. - Failures retry the publish; the consumer is idempotent.
The transaction guarantees the DB and the outbox row commit together or not at all. The publish is at-least-once; the consumer dedupes.
CREATE TABLE outbox ( id BIGSERIAL PRIMARY KEY, aggregate_id TEXT NOT NULL, -- e.g. order_id event_type TEXT NOT NULL, -- 'order.created' payload JSONB NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), sent_at TIMESTAMPTZ ); CREATE INDEX outbox_unsent ON outbox (created_at) WHERE sent_at IS NULL;
Sagas — Multi-Step Workflows that May Fail Anywhere
A traditional database transaction wraps several writes: charge the card, mark the order paid, decrement stock. If one fails, ROLLBACK undoes them all. This is impossible across services — you can't ROLLBACK a Stripe charge from a query plan. The solution is the saga: a sequence of local transactions, each with a defined compensating action that undoes its effect.
Two flavours
- Choreography — each service publishes events; other services react. Decentralized, no single brain. Hard to debug at scale; loops are easy to introduce accidentally.
- Orchestration — a single orchestrator service (or workflow engine) tells each step what to do and listens for results. Easier to reason about; the orchestrator becomes a critical path.
Modern teams overwhelmingly use orchestration via durable workflow engines (next section).
Durable Workflows — Temporal, Inngest, Restate
Writing sagas by hand — including "if the worker dies between step 2 and step 3, where do we resume?" — is exhausting and error-prone. Durable workflow engines solve this. The promise: write code that looks synchronous, the engine persists every step, crashes resume where they left off, and time passes via the engine, not real time.
// Looks synchronous. Persists state across crashes. Compensations explicit.
async function placeOrderWorkflow(orderId: string) {
const stockHold = await activities.reserveStock(orderId);
try {
const charge = await activities.chargeCard(orderId);
try {
await activities.shipOrder(orderId);
await activities.notifyUser(orderId);
return { ok: true };
} catch (err) {
await activities.refund(charge.id);
throw err;
}
} catch (err) {
await activities.releaseStock(stockHold.id);
throw err;
}
}What's happening under the hood: the engine records every activities.* call and its return value. If the worker dies in the middle of shipOrder, a new worker picks up the workflow and replays the recorded history to restore in-memory state, then continues from where the original left off. Time is also virtualized — await sleep(24h) doesn't block a worker; the engine wakes the workflow when needed.
This pattern was popularized by Cadence (Uber) → Temporal; Inngest and Restate are newer takes. They've become the default for any multi-step workflow — payment flows, onboarding, and crucially, LLM agent loops, which often involve dozens of steps spanning minutes or hours.
Backpressure — When the Producer Outpaces the Consumer
A queue without backpressure is a silent failure waiting to happen. The producer happily fills the queue; consumers can't keep up; the queue grows unbounded; broker memory or disk eventually overflows. Three remedies:
- Bounded queues. The queue has a max size. Producer either blocks, drops, or returns 429 to its caller. The classic token bucket rate limiter implements this.
- Reactive streams (RxJava, Project Reactor, Server-Sent Events with credit). The consumer signals how many items it can handle next; the producer respects the credit.
- Per-tenant fairness queues. One noisy tenant cannot starve others. Round-robin or weighted-fair across tenant queues.
For LLM workloads, backpressure is the difference between graceful degradation and a thundering herd. When the upstream LLM provider rate-limits, your consumer must back off — exponential backoff with jitter (Day 5) — or you'll hammer the rate limit harder.
Async Patterns for the AI Era
The async-LLM-call pattern
The recipe for any LLM call that takes more than a few seconds, or any batch operation:
- Client POSTs
/jobswith the prompt, gets back{ job_id, status: "queued" }. - The handler writes the job to the DB and pushes to the queue (outbox-pattern style).
- A worker pops the job, calls the LLM, streams or stores the result.
- Client polls
GET /jobs/{id}or receives a webhook when status flips todone. - For UX, the client may also subscribe to a per-job SSE stream that emits tokens as they arrive.
Most modern LLM platforms (Anthropic batch, OpenAI batch) provide this pattern as a managed offering for non-interactive workloads, with steep cost discounts (~50%) in exchange for the relaxed latency.
Long-running agent workflows
Agents that call multiple tools sequentially can run for tens of minutes. They're a perfect fit for durable workflows: each tool call is an activity, the workflow's state survives crashes, and the agent can sleep awaiting human approval ("reply to this email, but a human reviews first") without holding any resources.
Streaming through a queue
What about streaming tokens to the user when the actual generation is happening on a background worker? The pattern: the worker writes deltas to a Redis stream (or a dedicated low-latency channel) keyed by job_id; the API gateway holds the user's SSE connection open and tails that stream. The user gets sub-second tokens; the worker is decoupled from the connection lifecycle.
Dead-Letter Queues — Where Bad Messages Go
A poison message — malformed JSON, a permanently-failing dependency, code that throws an unrecoverable exception — will retry forever if you let it. The dead-letter queue is the safety valve: after N retries, the message moves to a separate queue for human inspection. Two rules:
- Always have one. Without a DLQ, retries pile up and starve healthy work.
- Alert on it. A growing DLQ is a system health signal — watch the count and the age of the oldest message.
Observability for Async Systems
The four metrics you must track per queue:
- Queue depth (lag) — current backlog. Trend matters more than instantaneous value.
- Oldest message age — your tail latency for async work. A 10-minute-old message in a "send email" queue is an incident.
- Throughput in vs out — produce rate vs consume rate. When in > out for sustained periods, you're falling behind.
- Failure rate & DLQ count — both as absolutes and as fraction of total processed.
Show answer
outbox table in the same transaction as the order. A separate publisher reads unsent rows and forwards them to Kafka, marking them sent. The two are now atomic; downstream sees every committed order, exactly the orders that actually committed. Pair with idempotent consumers because the publisher's at-least-once will occasionally double-publish.- Queue — the right shape (job/broker/stream).
- Idempotent — every consumer dedupes by some key.
- Outbox — DB write and emit are atomic.
- DLQ — poison messages go somewhere.
- Lag — monitor depth and oldest-message age.
- Microservices.io — Transactional Outboxmicroservices.io
- Microservices.io — Sagamicroservices.io
- Temporal — Why durable executiontemporal.io
- Apache Kafka — Design overviewkafka.apache.org
- AWS — Asynchronous architectures patternsaws.amazon.com
- Confluent — Kafka exactly-once, decodedconfluent.io
Finished reading?