The Engineering Codex/From Code to Internet: Deployment & Operations
DAY 5
05 / 07

Systemd, Processes, & Zero-Downtime Deploys

schedule13 minsignal_cellular_altIntermediate2,939 words
Your app runs as a process. The question is: what restarts it when it crashes, what brings it back after a reboot, what feeds it environment variables, and what swaps the running version for a new one without dropping a request? On modern Linux that answer starts with systemd. Today: write production-quality unit files, use environment files and journals, and build the deploy patterns — atomic symlink swap, blue-green, rolling, canary — that turn 'I deployed at 3 AM' into 'I deployed during lunch.'

What you will learn

01What systemd Actually Does
02Day-to-Day Operation
03Graceful Shutdown — From the App Side
04Environment Files & Secrets
05The Deploy Question — Four Strategies
06The Atomic Symlink Swap

The line between a hobby project and production is often the line between node server.js & in a tmux session and a properly supervised service. The first crashes on the next reboot, the next OOM, the next SIGHUP, and you find out from a customer. The second restarts itself, logs to a structured place, runs as a non-privileged user, and survives whatever the host throws at it. Today is about that line, and about the next-most-important question: how do you replace the running version with a new one without anyone noticing?

🔑
Today's outcome
1) Why systemd is the right answer for "keep this running." 2) A production unit file from scratch — type, restart policy, hardening, env files. 3) Logs and metrics via journald. 4) Graceful shutdown from the inside of your app. 5) Four deploy strategies — recreate, rolling, blue-green, canary — and when each fits. 6) The atomic symlink swap pattern that powers most one-box deploys. 7) What can go wrong mid-deploy and how to roll back fast.

What systemd Actually Does

systemd is the init system on every modern Linux distro that matters. Process 1, the parent of everything else, the one that keeps services running and reports when they stop. Older mental models map it to upstart, sysvinit, or supervisord; for our purposes it's a single tool with three relevant capabilities:

  • Process supervision: run a service, restart it on failure, run it on boot.
  • Dependency ordering: "start nginx after the network is up," "start the worker after Postgres is reachable."
  • Log aggregation: stdout/stderr go into the journal, queryable with journalctl.

The unit of deployment is a unit file. A unit file is a small INI-style config that says: how to start the service, how to stop it, when to restart it, who owns it, what it depends on. Here's the smallest production-shaped one you'd write.

ini — /etc/systemd/system/acme.service
[Unit]
Description=Acme web app
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=acme
Group=acme
WorkingDirectory=/opt/acme/current
EnvironmentFile=/opt/acme/shared/env
ExecStart=/usr/bin/node /opt/acme/current/server.js
ExecReload=/bin/kill -HUP $MAINPID
KillSignal=SIGTERM
TimeoutStopSec=30s
Restart=always
RestartSec=5s

# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/acme/shared /var/log/acme
PrivateTmp=true
LockPersonality=true
MemoryDenyWriteExecute=true

# Resource limits
LimitNOFILE=65535
MemoryMax=1G
CPUQuota=200%

[Install]
WantedBy=multi-user.target

That's a lot of stanzas. Let's read them in groups.

Type, dependencies, and lifecycle

  • Type=simple — the most common; ExecStart is the main process, alive as long as it runs. Other types: forking (daemon double-forks itself; rare in modern apps), notify (the app sends sd_notify when ready; clean), oneshot (run-and-exit, e.g. a migration).
  • After=network-online.target — start after the network is up.
  • ExecStart — the actual command. Always use absolute paths; systemd doesn't read your shell PATH.
  • Restart=always + RestartSec=5s — die, wait 5s, come back. on-failure is gentler if you want to leave a manually-stopped service stopped.
  • TimeoutStopSec=30s — give the app this long to shut down gracefully on stop, then SIGKILL it.

User, working directory, environment

  • User=acme — never run as root unless you must. Create a dedicated user: sudo adduser --system --no-create-home --group acme.
  • WorkingDirectory — the cwd of the process. Useful for relative paths in your code (don't rely on them, but they happen).
  • EnvironmentFile — load env vars from a file. The right home for secrets and config; mode 600, owned by the service user.

Hardening — free for the asking

systemd has a long list of sandboxing options that cost nothing and prevent classes of compromise.

  • NoNewPrivileges=true — process can't gain new caps via setuid binaries.
  • ProtectSystem=strict — most of the filesystem is read-only to this service. Specify ReadWritePaths for the directories it actually needs to write.
  • ProtectHome=true/home hidden.
  • PrivateTmp=true — its /tmp is namespaced, not shared with other processes.
  • MemoryDenyWriteExecute, LockPersonality, RestrictNamespaces, SystemCallFilter — increasingly aggressive, increasingly likely to break a particular language runtime. Enable progressively, test thoroughly.
💡
systemd-analyze security
Run systemd-analyze security acme.service for a 0–10 score and a list of mitigations you haven't enabled. Most distros' default unit files score 6+ — your handcrafted unit can hit 1 or 2 (better) with a few hours of effort. Diminishing returns past 2; getting from 6 to 3 is huge.

Day-to-Day Operation

bash — systemctl essentials
sudo systemctl daemon-reload                  # after editing a unit file
sudo systemctl enable --now acme.service      # start now, also start on boot
sudo systemctl status acme.service             # current state + last lines of log
sudo systemctl restart acme.service
sudo systemctl reload  acme.service             # if app supports SIGHUP
sudo systemctl stop    acme.service
sudo systemctl is-active acme.service          # scriptable; "active" or non-zero exit
sudo systemctl is-failed acme.service

sudo journalctl -u acme.service -n 200 --no-pager
sudo journalctl -u acme.service -f              # follow live
sudo journalctl -u acme.service --since "10 min ago" -p err
sudo journalctl -u acme.service --since today --grep 'panic|exception'

Two things to internalise:

  1. daemon-reload is required after any edit to a unit file. Without it, systemd serves the old version.
  2. journalctl is the universal log tool: per-unit (-u), priority (-p err), since/until, follow (-f), grep (--grep), JSON output (-o json) for tooling.

Graceful Shutdown — From the App Side

systemd's contract on stop is: SIGTERM, wait TimeoutStopSec, SIGKILL. Your app's job is to react to SIGTERM by:

  1. Stop accepting new connections (close the listening socket).
  2. Finish in-flight requests (the existing connections drain).
  3. Close downstream connections (DB pool, queue clients).
  4. Exit 0.

If you don't do this, every restart drops in-flight requests and clients see 502s. Most languages have a one-liner.

javascript — graceful shutdown in node
const http = require('http');
const server = http.createServer(app).listen(3000);

async function shutdown(signal) {
  console.log(`Received ${signal}, draining…`);
  server.close(async () => {
    await db.end();           // close pg pool
    await queue.disconnect();
    process.exit(0);
  });
  // Hard deadline — don't hang past systemd's TimeoutStopSec
  setTimeout(() => {
    console.error('Drain timeout, forcing exit');
    process.exit(1);
  }, 25000).unref();
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT',  () => shutdown('SIGINT'));

Set the app's hard timeout (the setTimeout here, 25s) to be under systemd's TimeoutStopSec (30s) so the app exits cleanly before systemd reaches for SIGKILL.

⚠️
Health checks lie during shutdown
During graceful shutdown, your /healthz endpoint should return non-200 before you stop accepting new connections — so the load balancer pulls you out of rotation before clients hit a closed socket. Add a shutting_down flag set on SIGTERM that flips the health check, then sleep one health-check interval before server.close(). Without this, every rolling deploy drops a few requests no matter how careful the rest of the code is.

Environment Files & Secrets

Twelve-factor says "config in env vars." Where do those vars come from?

  • EnvironmentFile in the unit. Plaintext on disk; mode 600, owned by the service user.
  • systemd credentials — newer pattern; loads secrets from disk into a tmpfs accessible only to the service. LoadCredential=stripe.key:/etc/credentials/stripe.key.
  • External secret manager (Vault, AWS Secrets Manager, Doppler) — the app fetches at startup using its IAM role. The most flexible; adds a startup-time dependency.

For a single-box EC2 deployment, an EnvironmentFile populated by your deploy script is fine. For anything multi-host, push toward Secrets Manager — rotation, audit, no plaintext on disk.

The Deploy Question — Four Strategies

Given a healthy running service, how do you replace its code with the next version? Four patterns trade availability for simplicity.

Recreate stop, deploy, start — downtime Rolling replace one node at a time Blue-Green two stacks, switch traffic Canary 5% → 25% → 100% with metrics gating each promotion Cost / complexity ↓ Recreate < Rolling < Blue-Green < Canary Risk of bad release ↓ Recreate > Rolling > Blue-Green > Canary Pick the simplest strategy your availability requirement tolerates.
Four deploy strategies on one axis: cost vs blast-radius. Most teams climb this ladder over time.

1. Recreate

Stop the old, deploy, start the new. Downtime equals startup time — usually 5–60 seconds. Fine for internal tools, dev environments, batch workers. Maintenance pages soften the impact for user-facing services.

2. Rolling

With multiple instances behind a load balancer: replace them one at a time, draining each before the swap. No downtime if your app does graceful shutdown. The default in Kubernetes Deployments, ECS services, ALB target groups. Trade-off: old and new versions run simultaneously — your code must tolerate that (DB schema compatibility, contract back/forward compat).

3. Blue-Green

Run two complete environments. "Blue" is live; "green" is the next version, fully provisioned but receiving no traffic. Validate green, then flip the load balancer (or DNS) to point at green. Roll back in seconds by flipping back. Costs 2× infra during the cutover, but the rollback story is unbeatable.

4. Canary

Send a small fraction (1%, 5%) of live traffic to the new version. Measure error rates, latency, business metrics. If healthy, ramp up; if not, roll back having affected only a sliver. Implementations: weighted DNS (slow), weighted upstreams in NGINX/Envoy/Istio (fast), feature flags inside the app (most flexible).

🌱
Climb the ladder
A new project should ship with recreate until it has real users, then move to rolling the moment downtime hurts. Blue-green shines when you have stateful migrations or want trivial rollback. Canary earns its complexity once you ship often and have observability to measure "is the new version healthy." Don't start at the top — fancy deploys with no monitoring are theatre.

The Atomic Symlink Swap

The classical Capistrano-style deploy on a single box looks like this:

bash — directory layout
/opt/acme/
├─ releases/
│  ├─ 20260502-101355/      ← previous release
│  ├─ 20260503-094502/      ← new release just unpacked
│  └─ 20260503-152008/      ← (older still kept for rollback)
├─ shared/
│  ├─ env
│  └─ logs/
└─ current → releases/20260503-094502/   ← symlink, the live version

Deploy = unpack the new release into releases/<timestamp>, run migrations, then atomically swap the current symlink:

bash — the cutover
set -euo pipefail
NEW=/opt/acme/releases/$(date -u +%Y%m%d-%H%M%S)

# 1. Build & ship the artifact
rsync -a --delete ./build/ acme@host:$NEW/
ssh acme@host "cd $NEW && npm ci --omit=dev"

# 2. Run migrations (idempotent, backward-compatible — see warning)
ssh acme@host "cd $NEW && npm run migrate"

# 3. Atomic swap: ln -nfs is non-atomic; use mv on a temp link to be truly atomic
ssh acme@host "ln -nfs $NEW /opt/acme/current.new && mv -Tf /opt/acme/current.new /opt/acme/current"

# 4. Reload the service so it picks up the new symlink target
ssh acme@host "sudo systemctl reload acme.service"
# (or restart, if the app can't SIGHUP-reload its code)

# 5. Prune old releases — keep last 5 for rollback
ssh acme@host "ls -1dt /opt/acme/releases/* | tail -n +6 | xargs -r rm -rf"

Rollback is one symlink swap to a previous timestamp + reload. Cheap, fast, requires no rebuild.

🚨
Schema migrations are the rollback killer
A code rollback is easy. A schema rollback (you ran ALTER TABLE DROP COLUMN in step 2) is hard or impossible. Discipline: every migration must be backward compatible with the previous version of the code. "Add column" is safe; "drop column" requires a deploy that stops referencing it, then a follow-up deploy that drops it. "Rename column" requires a transitional state where both names work. Treat the schema as a public API of the old code.

Connecting Strategy + Symlink + systemd

The pieces fit together for a single-box rolling-style deploy on one EC2:

  1. Run two systemd services on different ports: acme@blue.service on 3000, acme@green.service on 3001 (systemd template units make this clean).
  2. NGINX upstream pool has both, marked equal weight. Both versions serve traffic when both are running.
  3. Deploy procedure:
    1. Drop the new release symlink for blue.
    2. Reload acme@blue.service — it picks up the new code.
    3. Wait for its /healthz to come back green.
    4. Repeat for green.
  4. If a deploy fails, the second instance is still on the old code; revert the symlink, reload, no traffic was harmed.

For multi-box, the load balancer (ALB or your own NGINX in front) does the rolling automatically — drain a target, deploy, return it to service, repeat.

Templated Units — Many Instances of One App

systemd has a built-in template syntax: a unit file with @ in the name accepts an instance argument. acme@.service with %i in its body becomes acme@blue, acme@green, acme@worker-1, etc.

ini — /etc/systemd/system/acme@.service
[Unit]
Description=Acme web app — instance %i

[Service]
Type=simple
User=acme
EnvironmentFile=/opt/acme/shared/env.%i
ExecStart=/usr/bin/node /opt/acme/current/server.js
Restart=always

[Install]
WantedBy=multi-user.target

# Manage like:
# sudo systemctl enable --now acme@blue.service
# sudo systemctl enable --now acme@green.service

Each instance reads its own env.blue or env.green file, listens on its assigned port (set via env var), and is supervised independently.

Timers — The Modern Cron

systemd timers replace cron for scheduled work, with the advantages: full systemd context (logs, failure handling, dependencies, sandboxing), human-readable schedule, and easy ad-hoc trigger.

ini — backup.timer + backup.service
# /etc/systemd/system/backup.service
[Unit]
Description=Daily database backup

[Service]
Type=oneshot
User=acme
ExecStart=/opt/acme/scripts/backup.sh

# /etc/systemd/system/backup.timer
[Unit]
Description=Run daily database backup

[Timer]
OnCalendar=*-*-* 03:30:00       # every day at 03:30 local
RandomizedDelaySec=15m            # avoid stampede across many hosts
Persistent=true                    # if the host was off at 03:30, run on next boot

[Install]
WantedBy=timers.target

# Enable: sudo systemctl enable --now backup.timer
# List:   systemctl list-timers --all
# Run now: sudo systemctl start backup.service

What Goes Wrong Mid-Deploy

  1. Service won't start, no useful error. journalctl -u acme.service -n 200 first. systemd-analyze verify acme.service for syntax issues. strace as a last resort.
  2. EnvironmentFile doesn't load. Mode 600 plus owner=service user; trailing newline; no shell expansion (FOO=$BAR is literal in EnvironmentFile, not expanded).
  3. Migrations time out and the service starts on a half-migrated DB. Use a oneshot migration unit before the app: Before=acme.service with ExecStart=migrate. The app will fail to start if migrations fail.
  4. Old releases pile up, fill disk. Prune in your deploy script (see above) or with a daily timer.
  5. Symlink swap not atomic. ln -nfs deletes-then-creates with a tiny window; use ln + mv -T for true atomicity.
  6. Health check returns 200 before app is ready. Watch out for frameworks that 200 on the listening port before the worker pool is warm. The health endpoint should explicitly check downstream readiness.
  7. Forgot daemon-reload after editing the unit. systemd is using the old version. Always reload.

Rolling Back Fast

Speed matters more than pride. Three patterns:

  • Symlink rollback. One ln -nfs <old-release> current + reload. Seconds.
  • Blue-green flip. Repoint the load balancer at the old environment. Seconds.
  • Canary abort. Set the canary weight to 0%. The new version is still running but receives no traffic. Seconds.

What you do not want during a rollback: a fresh git clone, npm install, build, push. "How fast can I roll back" is a property of your deploy system, not your skill at typing commands.

Quick check
A teammate adds a feature: when the app shuts down it pushes a final "goodbye" event to a queue, then exits. Two days later the on-call engineer reports that systemd is force-killing the app on every restart, leaving zombie connections. What's likely happening, and what are two ways to fix it?
Show answer
What's happening: the queue push is hanging or timing out, so the app's shutdown takes longer than systemd's TimeoutStopSec (default 90s but often shortened to 30s). systemd waits the timeout, then SIGKILLs — bypassing the rest of cleanup, including DB pool close, leaving connections in SHOWING idle in transaction on the database. Fix 1: wrap the goodbye push in a tight timeout (2–5s) and treat its failure as recoverable. The shutdown path must always finish in under TimeoutStopSec regardless of downstream health. Fix 2: raise TimeoutStopSec if the work is genuinely worth waiting for, and ensure the load balancer has already drained connections (so the queue push isn't fighting for time with in-flight requests). The general principle: shutdown is on a budget, and every step in it must respect that budget.
Mnemonic — make a service production-grade
"User. Restart. Hardening. Logs. Reload."
  • User — never root.
  • Restart — Restart=always with backoff.
  • Hardening — NoNewPrivileges + ProtectSystem + others.
  • Logs — to journald, queryable.
  • Reload — graceful via SIGHUP or zero-downtime swap.
Flashcard
Why is "add a NOT NULL column with no default" a deploy-breaker even though the migration itself succeeds, and what's the canonical fix?
Click to flip ↻
Answer
Why it breaks: for any deploy strategy except recreate-with-downtime, the old code and new code run simultaneously for some interval. Old code doesn't know about the new column and inserts rows without it; the database rejects those inserts because the column is NOT NULL. Result: errors during the rollout, often as 500s on POST endpoints. Canonical fix — three deploys: (1) add the column as nullable; deploy the new code that writes it; backfill existing rows. (2) once all live data is populated and old code is gone, deploy a migration that adds NOT NULL and (optionally) a default. (3) clean up. The rule generalises: every schema change must be backward compatible with the previous code; one-shot "big bang" migrations work only when you accept downtime.
🔑
Key takeaways
1) systemd is the right answer for "keep this running" — supervision, dependencies, journals, hardening, all built in. 2) A production unit file names a user, defines a restart policy, hardens the sandbox, loads env from a file. 3) Graceful shutdown is the app's job; SIGTERM → drain → exit, with a hard deadline under systemd's stop timeout. 4) Climb the deploy ladder: recreate → rolling → blue-green → canary; the right one is the simplest your availability tolerates. 5) The atomic symlink swap with timestamped releases gives you fast deploys and free rollbacks; always design for backward-compatible schema changes.

Finished reading?