The Engineering Codex/From Code to Internet: Deployment & Operations
DAY 6
06 / 07

Containers & Docker — and When to Graduate

schedule13 minsignal_cellular_altIntermediate2,871 words
A container is a process with its own filesystem, network, and PID namespace — the same Linux primitives the kernel has had for a decade, packaged with a sane UX. Docker wrapped them; today every modern deploy starts with a Dockerfile. Master images, layers, multi-stage builds, networking, compose, registries, and the moment you outgrow 'docker run on a single box' and graduate to ECS, Fargate, or Kubernetes.

What you will learn

01What a Container Actually Is
02The Image — Layered, Cacheable, Reproducible
03Multi-Stage Builds — Small Images for Compiled Apps
04Image Tags — Don't Use latest in Production
05Networking — How Containers Talk
06Volumes and Persistence

Containers solve a very specific problem: the gap between "runs on my machine" and "runs on the server." The way they solve it — packaging the application plus its OS-level dependencies into one shippable artifact — turned out to be useful for so many other things (reproducible builds, dev environments, CI runners, isolation between services on one box) that the format became universal. Today, even teams that don't run Kubernetes ship containers; the Docker image is the lingua franca of deployment. This chapter covers the bits you'll touch every week: writing efficient Dockerfiles, understanding what a layer is and why caching matters, networking and volumes, docker compose for local stacks, and the signals that tell you it's time to leave a single box.

🔑
Today's outcome
1) What a container is — namespaces and cgroups, no magic. 2) Dockerfiles, layers, and the cache — small images, fast rebuilds. 3) Multi-stage builds for compiled languages and asset pipelines. 4) Networking, volumes, secrets — the runtime concerns. 5) docker compose for local dev stacks. 6) Registries and tags — versioning, signing, scanning. 7) The upgrade path — when single-box Docker stops fitting, and which platform fits next.

What a Container Actually Is

A container is a Linux process. The illusion that it's an isolated machine is built from kernel features that pre-date Docker by years:

  • Namespaces isolate what a process can see. Six common ones: pid (its own PID space), net (its own network stack), mnt (its own filesystem view), uts (its own hostname), ipc (its own SysV IPC), user (its own UID mapping).
  • Cgroups limit and account for resource use: CPU shares, memory caps, IO bandwidth.
  • Capabilities drop privileges fine-grained: a container that doesn't need CAP_NET_RAW doesn't get it.
  • seccomp / AppArmor / SELinux further restrict syscalls.
  • OverlayFS / btrfs stacks layers of read-only filesystem with a writable top layer — that's how a container starts in milliseconds: nothing is copied, everything is mounted.

Docker's contribution was packaging this into one CLI plus an image format that anyone can build, push, and pull. Container = process + namespaces + cgroups + an image layered on overlayfs. No virtualization in the VMware sense; the kernel is shared.

Virtual Machine App Libs Guest kernel Hypervisor Host kernel + hardware Container App Libs Container runtime (containerd, runc) Shared host kernel + hardwareisolation via namespaces & cgroups
VMs ship a guest kernel; containers share the host kernel. That's why containers boot in milliseconds and a VM in minutes — and why a kernel exploit is more serious in containers.

The Image — Layered, Cacheable, Reproducible

A Docker image is a stack of read-only layers, each the result of one Dockerfile instruction. When you build, Docker hashes each instruction's inputs and reuses cached layers when nothing changed. Layer order is performance.

A Dockerfile worth keeping

dockerfile — Node app, single-stage
FROM node:20-alpine

# Set workdir before copying so absolute paths work
WORKDIR /app

# 1) Copy only the dependency manifest first — invalidate cache only when deps change
COPY package.json package-lock.json ./
RUN npm ci --omit=dev

# 2) Then copy the source — invalidates only the source layers
COPY . .

# 3) Run as a non-root user — node:alpine ships one
USER node

# 4) Document the port, declare a healthcheck, and define the entrypoint
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget -qO- http://127.0.0.1:3000/healthz || exit 1

CMD ["node", "server.js"]

The crucial trick is the order of COPY instructions. Source code changes far more often than package.json does. By copying just package.json first and running npm ci, the slow dependency install is cached across rebuilds where you only changed your code. Reverse the order, and every code change reruns npm ci — your CI goes from 30s to 5 minutes.

⚠️
.dockerignore is mandatory
Every COPY . . sucks in node_modules, .git, build/, .env, your laptop's .DS_Store. Without a .dockerignore, your image bloats by hundreds of megabytes and may bake secrets straight in. .dockerignore mirrors .gitignore for builds — make it the first file you write alongside the Dockerfile.

Multi-Stage Builds — Small Images for Compiled Apps

For compiled languages (Go, Rust, TypeScript with a build step), the build environment is huge — compilers, dev headers, source code, intermediate artifacts — but the runtime needs almost none of that. Multi-stage Dockerfiles let you build in one image and copy only the artifacts to a smaller final image.

dockerfile — multi-stage Go service
# --- build stage ---
FROM golang:1.22-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /out/app ./cmd/server

# --- runtime stage ---
FROM gcr.io/distroless/static-debian12
COPY --from=build /out/app /app
USER 65532:65532
EXPOSE 8080
ENTRYPOINT ["/app"]

The first stage is ~700 MB; the final image is ~10 MB — just the static Go binary on a distroless base. Smaller images mean faster pulls, less attack surface, and cheaper registry storage. The same pattern works for Rust (cargo build --release in a build stage, copy to scratch or distroless), C++, and JS asset pipelines.

Base image choice

BaseSizeTrade-off
node:20 (Debian)~1 GBFamiliar, has every tool, slow to pull
node:20-slim~250 MBDebian without the kitchen sink — usually the right default
node:20-alpine~150 MBmusl libc — occasionally trips native modules; otherwise great
distroless/nodejs20~120 MBNo shell, no package manager — minimal attack surface
scratch0 bytesStatic binaries only (Go, Rust). Smallest possible image

Default to slim or alpine for interpreted languages, distroless or scratch for compiled binaries. The handful of native-module compatibility headaches with Alpine (musl) are worth knowing about; node:20-slim is the safer general default.

Image Tags — Don't Use latest in Production

An image is identified by registry + name + tag (or digest). Tags are mutable; digests are not.

text — naming an image
123456789.dkr.ecr.us-east-1.amazonaws.com/acme:1.4.2-abc1234
│                                          │     │
│                                          │     └ tag (mutable)
│                                          └ image name
└ registry

# Or pinned by digest (immutable):
123456789.dkr.ecr.us-east-1.amazonaws.com/acme@sha256:7a8d…

Three tagging conventions worth following:

  • Semver + git SHA: 1.4.2-abc1234. Human-readable, traceable to a commit.
  • Never deploy :latest. It points wherever it last pointed; rolling back means re-tagging, not re-deploying. Reserve :latest for dev convenience or omit entirely.
  • Pin by digest in production. image@sha256:… ensures the exact bytes you tested are what runs. Tags can be reused; digests can't.

Networking — How Containers Talk

Each container gets a network namespace. Docker's default bridge network puts containers on a virtual switch with private addresses; the daemon NATs outbound traffic and exposes inbound via published ports.

  • docker run -p 8080:80 nginx — publishes container port 80 on host port 8080. Anyone can hit the host on 8080.
  • -p 127.0.0.1:8080:80 — bind to localhost only. NGINX in front, container behind. The right default.
  • User-defined networks let containers reach each other by name. docker network create app; docker run --network app --name db postgres; another container on the same network resolves db to its IP.
  • Host networking (--network host) skips namespacing entirely. Faster but loses isolation; use sparingly.

Volumes and Persistence

Container filesystems are ephemeral; everything written goes to the writable layer and vanishes when the container is removed. For state that has to persist, two options:

  • Bind mounts (-v /host/path:/container/path): map a host directory in. Fast, useful for dev (your editor saves a file, the container sees it).
  • Named volumes (-v acme_data:/var/lib/postgresql/data): Docker manages the storage location, easier to back up, portable across hosts.

For production state (databases), prefer a managed service (RDS, ElastiCache) over a database in a container. Volumes are fine for stateless caches and ephemeral scratch.

docker compose — The Local Stack

One Dockerfile gives you one image. Real systems are several images talking to each other. docker compose lets you describe the whole stack in one YAML file, bring it up with one command, and tear it down the same way.

yaml — compose.yaml for a small app
services:
  app:
    build: .
    image: acme:dev
    ports:
      - "127.0.0.1:8080:3000"
    environment:
      DATABASE_URL: postgres://acme:acme@db:5432/acme
      REDIS_URL:    redis://cache:6379
    depends_on:
      db:    { condition: service_healthy }
      cache: { condition: service_started }
    volumes:
      - .:/app:delegated         # bind-mount source for hot reload (dev only)

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: acme
      POSTGRES_USER:     acme
    volumes:
      - db_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U acme"]
      interval: 5s
      retries: 10

  cache:
    image: redis:7-alpine

volumes:
  db_data:

One docker compose up away from a Postgres + Redis + app stack on localhost. The same file is the basis for testing in CI (different env vars, no source bind-mount, real built image). Compose isn't a production orchestrator — but it is the most useful local-dev tool of the last decade.

Registries — Where Images Live Between Build and Run

You build images on CI, push them to a registry, and pull them on the host that runs them. The big choices:

RegistryNotes
Docker HubThe original. Free public, paid private. Fine for OSS images.
GitHub Container Registry (ghcr.io)Free, integrates with GitHub Actions, sensible permissions. Good default for OSS + small teams.
Amazon ECRBuilt into AWS, IAM-authenticated, pulls fast inside AWS, scanning included. Default for AWS-native deploys.
Google Artifact Registry / Azure ACRThe respective cloud-native equivalents.
Self-hosted (Harbor, Nexus)For air-gapped or compliance-heavy environments.

Push, scan, sign

bash — typical CI publish step
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin $REPO

docker build --platform linux/amd64 -t $REPO/acme:$VERSION-$SHA .
docker push $REPO/acme:$VERSION-$SHA

# Scan for vulnerabilities (ECR runs Trivy/Inspector automatically; or run locally)
trivy image --exit-code 1 --severity HIGH,CRITICAL $REPO/acme:$VERSION-$SHA

# Sign for supply-chain integrity (Sigstore cosign)
cosign sign --key $COSIGN_KEY $REPO/acme:$VERSION-$SHA

Vulnerability scanning is table stakes; signing (cosign + Sigstore, SLSA) is table stakes if your customers ask about supply-chain security. Both fit naturally into a CI pipeline (Day 7).

💡
Build for the right architecture
If your laptop is an Apple Silicon Mac and your servers are x86, docker build by default builds an arm64 image — which won't run on x86. Always pass --platform linux/amd64 in CI, or build cross-architecture with docker buildx: docker buildx build --platform linux/amd64,linux/arm64 -t $TAG --push .. The latter publishes a manifest list so each runtime pulls the matching arch.

Running Containers in Production — The Single-Box Path

The simplest production setup is the EC2 from Day 2 with Docker installed, a systemd unit (Day 5) running docker compose up or a single docker run, NGINX in front (Day 3), and TLS automated (Day 4). For a small project, this is plenty.

ini — /etc/systemd/system/acme-stack.service
[Unit]
Description=Acme stack via docker compose
After=docker.service network-online.target
Wants=docker.service network-online.target

[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/acme
EnvironmentFile=/opt/acme/.env
ExecStart=/usr/bin/docker compose pull
ExecStart=/usr/bin/docker compose up -d --remove-orphans
ExecStop=/usr/bin/docker compose down
TimeoutStartSec=180

[Install]
WantedBy=multi-user.target

Deploys = git pull + docker compose pull + docker compose up -d. Compose handles rolling out one container at a time if your service has deploy.update_config set; for a real rolling deploy you usually move to a higher-level orchestrator.

When to Graduate

Single-box Docker is fantastic until it isn't. The signs:

  • You need more than one box. An ALB, two EC2s, two compose stacks — workable but increasingly fiddly. Time for orchestration.
  • Autoscaling. Compose doesn't autoscale; you've outgrown it the moment traffic spikes drive your ops time.
  • Many services, many teams. The deploy concurrency (and the blast radius of one bad change) demands platform tooling.
  • Stateful complexity. StatefulSets, persistent volume claims, leader election — the stuff Kubernetes does well.
  • Compliance and isolation. Per-team namespaces, fine-grained IAM, network policies, image-signing enforcement.

The platform options, in increasing complexity

PlatformSweet spotCost-of-ops
App Runner / Fly.io / RailwayOne service, deploy from a repo, autoscaling includedTiny — they manage everything
ECS FargateA handful of services, no servers to patchLow — task definitions, services, ALB target groups
ECS on EC2Cost-conscious teams running many tasks per hostMedium — you manage the EC2s
EKS / GKE / AKS (managed K8s)Many teams, many services, platform team in the orgHigh — full Kubernetes complexity, but managed control plane
Self-managed KubernetesSpecific compliance or perf requirementsVery high — only attempt with platform staff
NomadK8s-tired teams; multi-region simplerMedium — leaner than K8s but smaller ecosystem
🌱
Don't start at Kubernetes
A common mistake on a new project: "we'll need K8s eventually, let's start there." The first six months of K8s for a small team are not productive engineering — they're learning Kubernetes. Starting with App Runner, Fargate, or even the EC2-and-compose pattern lets you ship a product. Migrate when the business need is real and the team has bandwidth. The image is portable; the runtime is replaceable.

Container Pitfalls

  1. PID 1 zombie reaping. Many languages don't reap child processes when run as PID 1. Use --init (Docker provides a tiny init that reaps), or a language-specific init like tini, or build with one in.
  2. Logs to files. Containers expect logs on stdout/stderr; writing to /var/log/app.log means logs disappear on restart. Make your app log to stdout.
  3. Writing inside the image. The image's layers are read-only; the writable layer is per-container. Anything you need across restarts goes in a volume.
  4. Running as root. Default if you don't set USER. A container escape from root inside is much worse than from a non-root UID. Always set a non-root user.
  5. Latest tag in production. Covered above; the source of "it works on dev, fails in prod" mysteries.
  6. Resource limits unset. A container with no memory limit can OOM the host. Set --memory and --cpus (or resources.limits in compose/k8s).
  7. Time inside the container. Time zones, NTP — usually fine, but a host with skewed clock affects every container on it.
  8. SIGTERM ignored. Many languages need exec form (CMD ["node", "server.js"]) to receive signals; shell form (CMD node server.js) wraps in /bin/sh -c which eats SIGTERM. Always use exec form.
Quick check
A teammate's Dockerfile starts with FROM node:20 (~1 GB) and the resulting image is 1.6 GB. Their CI pipeline takes 8 minutes to build because every code change reruns npm install. Outline three changes that together drop image size to under 200 MB and rebuild time to under 60 seconds, in priority order.
Show answer
1) Reorder COPY instructions and use npm ci — copy package.json + package-lock.json first, run npm ci --omit=dev, then copy the rest. Dependency install is now a cached layer for any commit that doesn't change deps. Rebuild time drops from minutes to ~10 seconds for code-only changes. 2) Switch base to node:20-slim or node:20-alpine — drops the base from ~1 GB to ~250 MB or ~150 MB. 3) Use a multi-stage build: build stage compiles assets and installs all deps, runtime stage starts from a minimal base and copies only node_modules + the build output + server.js. With node:20-alpine + multi-stage + tight COPY scope, the image is ~120-180 MB. Other useful additions: a .dockerignore excluding node_modules, .git, build/; setting USER node; setting NODE_ENV=production so npm prunes dev deps automatically. The cumulative effect is a tighter image, faster CI, and a smaller attack surface.
Mnemonic — a good Dockerfile
"Slim base. Cache deps. Multi-stage. Non-root. Exec form."
  • Slim base — alpine, slim, distroless.
  • Cache deps — copy manifest first, install, then copy source.
  • Multi-stage — build big, ship small.
  • Non-root — set USER explicitly.
  • Exec form — CMD ["node","server.js"], not the shell form.
Flashcard
Why does docker build on an Apple Silicon laptop produce an image that fails on x86 EC2 with "exec format error," and what's the cleanest fix in CI?
Click to flip ↻
Answer
What's happening: Docker builds for the host's architecture by default. An M-series Mac is arm64; EC2 (unless you picked a Graviton instance type) is amd64. The image's binaries — your interpreter, native modules — are the wrong machine code for the runtime. The kernel's binfmt loader rejects them: "exec format error." Fix in CI: always build with an explicit platform: docker buildx build --platform linux/amd64 -t $TAG --push . if you target x86, or build a multi-arch manifest with --platform linux/amd64,linux/arm64 so the right one is picked at pull time. Local dev on Apple silicon then runs the arm64 variant; production EC2 pulls the amd64 variant. The longer-term move is to run on Graviton (arm64) servers — they're cheaper and the local dev arch matches.
🔑
Key takeaways
1) A container is just a process with namespaces and cgroups — same kernel, different view. 2) Layer order is performance; cache dep installs by copying manifests first. 3) Multi-stage builds + small base images = small attack surface and fast pulls. 4) compose is the local dev sweet spot; production usually graduates to ECS, App Runner, or K8s. 5) Pin tags by digest, scan for vulns, and never run as root. 6) Don't start at Kubernetes — climb the platform ladder when the business pain demands it.

Finished reading?