Containers & Docker — and When to Graduate
A container is a process with its own filesystem, network, and PID namespace — the same Linux primitives the kernel has had for a decade, packaged with a sane UX. Docker wrapped them; today every modern deploy starts with a Dockerfile. Master images, layers, multi-stage builds, networking, compose, registries, and the moment you outgrow 'docker run on a single box' and graduate to ECS, Fargate, or Kubernetes.
What you will learn
Containers solve a very specific problem: the gap between "runs on my machine" and "runs on the server." The way they solve it — packaging the application plus its OS-level dependencies into one shippable artifact — turned out to be useful for so many other things (reproducible builds, dev environments, CI runners, isolation between services on one box) that the format became universal. Today, even teams that don't run Kubernetes ship containers; the Docker image is the lingua franca of deployment. This chapter covers the bits you'll touch every week: writing efficient Dockerfiles, understanding what a layer is and why caching matters, networking and volumes, docker compose for local stacks, and the signals that tell you it's time to leave a single box.
What a Container Actually Is
A container is a Linux process. The illusion that it's an isolated machine is built from kernel features that pre-date Docker by years:
- Namespaces isolate what a process can see. Six common ones:
pid(its own PID space),net(its own network stack),mnt(its own filesystem view),uts(its own hostname),ipc(its own SysV IPC),user(its own UID mapping). - Cgroups limit and account for resource use: CPU shares, memory caps, IO bandwidth.
- Capabilities drop privileges fine-grained: a container that doesn't need
CAP_NET_RAWdoesn't get it. - seccomp / AppArmor / SELinux further restrict syscalls.
- OverlayFS / btrfs stacks layers of read-only filesystem with a writable top layer — that's how a container starts in milliseconds: nothing is copied, everything is mounted.
Docker's contribution was packaging this into one CLI plus an image format that anyone can build, push, and pull. Container = process + namespaces + cgroups + an image layered on overlayfs. No virtualization in the VMware sense; the kernel is shared.
The Image — Layered, Cacheable, Reproducible
A Docker image is a stack of read-only layers, each the result of one Dockerfile instruction. When you build, Docker hashes each instruction's inputs and reuses cached layers when nothing changed. Layer order is performance.
A Dockerfile worth keeping
FROM node:20-alpine # Set workdir before copying so absolute paths work WORKDIR /app # 1) Copy only the dependency manifest first — invalidate cache only when deps change COPY package.json package-lock.json ./ RUN npm ci --omit=dev # 2) Then copy the source — invalidates only the source layers COPY . . # 3) Run as a non-root user — node:alpine ships one USER node # 4) Document the port, declare a healthcheck, and define the entrypoint EXPOSE 3000 HEALTHCHECK --interval=30s --timeout=3s --retries=3 \ CMD wget -qO- http://127.0.0.1:3000/healthz || exit 1 CMD ["node", "server.js"]
The crucial trick is the order of COPY instructions. Source code changes far more often than package.json does. By copying just package.json first and running npm ci, the slow dependency install is cached across rebuilds where you only changed your code. Reverse the order, and every code change reruns npm ci — your CI goes from 30s to 5 minutes.
COPY . . sucks in node_modules, .git, build/, .env, your laptop's .DS_Store. Without a .dockerignore, your image bloats by hundreds of megabytes and may bake secrets straight in. .dockerignore mirrors .gitignore for builds — make it the first file you write alongside the Dockerfile.Multi-Stage Builds — Small Images for Compiled Apps
For compiled languages (Go, Rust, TypeScript with a build step), the build environment is huge — compilers, dev headers, source code, intermediate artifacts — but the runtime needs almost none of that. Multi-stage Dockerfiles let you build in one image and copy only the artifacts to a smaller final image.
# --- build stage --- FROM golang:1.22-alpine AS build WORKDIR /src COPY go.mod go.sum ./ RUN go mod download COPY . . RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /out/app ./cmd/server # --- runtime stage --- FROM gcr.io/distroless/static-debian12 COPY --from=build /out/app /app USER 65532:65532 EXPOSE 8080 ENTRYPOINT ["/app"]
The first stage is ~700 MB; the final image is ~10 MB — just the static Go binary on a distroless base. Smaller images mean faster pulls, less attack surface, and cheaper registry storage. The same pattern works for Rust (cargo build --release in a build stage, copy to scratch or distroless), C++, and JS asset pipelines.
Base image choice
| Base | Size | Trade-off |
|---|---|---|
node:20 (Debian) | ~1 GB | Familiar, has every tool, slow to pull |
node:20-slim | ~250 MB | Debian without the kitchen sink — usually the right default |
node:20-alpine | ~150 MB | musl libc — occasionally trips native modules; otherwise great |
distroless/nodejs20 | ~120 MB | No shell, no package manager — minimal attack surface |
scratch | 0 bytes | Static binaries only (Go, Rust). Smallest possible image |
Default to slim or alpine for interpreted languages, distroless or scratch for compiled binaries. The handful of native-module compatibility headaches with Alpine (musl) are worth knowing about; node:20-slim is the safer general default.
Image Tags — Don't Use latest in Production
An image is identified by registry + name + tag (or digest). Tags are mutable; digests are not.
123456789.dkr.ecr.us-east-1.amazonaws.com/acme:1.4.2-abc1234 │ │ │ │ │ └ tag (mutable) │ └ image name └ registry # Or pinned by digest (immutable): 123456789.dkr.ecr.us-east-1.amazonaws.com/acme@sha256:7a8d…
Three tagging conventions worth following:
- Semver + git SHA:
1.4.2-abc1234. Human-readable, traceable to a commit. - Never deploy
:latest. It points wherever it last pointed; rolling back means re-tagging, not re-deploying. Reserve:latestfor dev convenience or omit entirely. - Pin by digest in production.
image@sha256:…ensures the exact bytes you tested are what runs. Tags can be reused; digests can't.
Networking — How Containers Talk
Each container gets a network namespace. Docker's default bridge network puts containers on a virtual switch with private addresses; the daemon NATs outbound traffic and exposes inbound via published ports.
docker run -p 8080:80 nginx— publishes container port 80 on host port 8080. Anyone can hit the host on 8080.-p 127.0.0.1:8080:80— bind to localhost only. NGINX in front, container behind. The right default.- User-defined networks let containers reach each other by name.
docker network create app;docker run --network app --name db postgres; another container on the same network resolvesdbto its IP. - Host networking (
--network host) skips namespacing entirely. Faster but loses isolation; use sparingly.
Volumes and Persistence
Container filesystems are ephemeral; everything written goes to the writable layer and vanishes when the container is removed. For state that has to persist, two options:
- Bind mounts (
-v /host/path:/container/path): map a host directory in. Fast, useful for dev (your editor saves a file, the container sees it). - Named volumes (
-v acme_data:/var/lib/postgresql/data): Docker manages the storage location, easier to back up, portable across hosts.
For production state (databases), prefer a managed service (RDS, ElastiCache) over a database in a container. Volumes are fine for stateless caches and ephemeral scratch.
docker compose — The Local Stack
One Dockerfile gives you one image. Real systems are several images talking to each other. docker compose lets you describe the whole stack in one YAML file, bring it up with one command, and tear it down the same way.
services:
app:
build: .
image: acme:dev
ports:
- "127.0.0.1:8080:3000"
environment:
DATABASE_URL: postgres://acme:acme@db:5432/acme
REDIS_URL: redis://cache:6379
depends_on:
db: { condition: service_healthy }
cache: { condition: service_started }
volumes:
- .:/app:delegated # bind-mount source for hot reload (dev only)
db:
image: postgres:16-alpine
environment:
POSTGRES_PASSWORD: acme
POSTGRES_USER: acme
volumes:
- db_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U acme"]
interval: 5s
retries: 10
cache:
image: redis:7-alpine
volumes:
db_data:
One docker compose up away from a Postgres + Redis + app stack on localhost. The same file is the basis for testing in CI (different env vars, no source bind-mount, real built image). Compose isn't a production orchestrator — but it is the most useful local-dev tool of the last decade.
Registries — Where Images Live Between Build and Run
You build images on CI, push them to a registry, and pull them on the host that runs them. The big choices:
| Registry | Notes |
|---|---|
| Docker Hub | The original. Free public, paid private. Fine for OSS images. |
| GitHub Container Registry (ghcr.io) | Free, integrates with GitHub Actions, sensible permissions. Good default for OSS + small teams. |
| Amazon ECR | Built into AWS, IAM-authenticated, pulls fast inside AWS, scanning included. Default for AWS-native deploys. |
| Google Artifact Registry / Azure ACR | The respective cloud-native equivalents. |
| Self-hosted (Harbor, Nexus) | For air-gapped or compliance-heavy environments. |
Push, scan, sign
aws ecr get-login-password --region us-east-1 | \ docker login --username AWS --password-stdin $REPO docker build --platform linux/amd64 -t $REPO/acme:$VERSION-$SHA . docker push $REPO/acme:$VERSION-$SHA # Scan for vulnerabilities (ECR runs Trivy/Inspector automatically; or run locally) trivy image --exit-code 1 --severity HIGH,CRITICAL $REPO/acme:$VERSION-$SHA # Sign for supply-chain integrity (Sigstore cosign) cosign sign --key $COSIGN_KEY $REPO/acme:$VERSION-$SHA
Vulnerability scanning is table stakes; signing (cosign + Sigstore, SLSA) is table stakes if your customers ask about supply-chain security. Both fit naturally into a CI pipeline (Day 7).
docker build by default builds an arm64 image — which won't run on x86. Always pass --platform linux/amd64 in CI, or build cross-architecture with docker buildx: docker buildx build --platform linux/amd64,linux/arm64 -t $TAG --push .. The latter publishes a manifest list so each runtime pulls the matching arch.Running Containers in Production — The Single-Box Path
The simplest production setup is the EC2 from Day 2 with Docker installed, a systemd unit (Day 5) running docker compose up or a single docker run, NGINX in front (Day 3), and TLS automated (Day 4). For a small project, this is plenty.
[Unit] Description=Acme stack via docker compose After=docker.service network-online.target Wants=docker.service network-online.target [Service] Type=oneshot RemainAfterExit=true WorkingDirectory=/opt/acme EnvironmentFile=/opt/acme/.env ExecStart=/usr/bin/docker compose pull ExecStart=/usr/bin/docker compose up -d --remove-orphans ExecStop=/usr/bin/docker compose down TimeoutStartSec=180 [Install] WantedBy=multi-user.target
Deploys = git pull + docker compose pull + docker compose up -d. Compose handles rolling out one container at a time if your service has deploy.update_config set; for a real rolling deploy you usually move to a higher-level orchestrator.
When to Graduate
Single-box Docker is fantastic until it isn't. The signs:
- You need more than one box. An ALB, two EC2s, two compose stacks — workable but increasingly fiddly. Time for orchestration.
- Autoscaling. Compose doesn't autoscale; you've outgrown it the moment traffic spikes drive your ops time.
- Many services, many teams. The deploy concurrency (and the blast radius of one bad change) demands platform tooling.
- Stateful complexity. StatefulSets, persistent volume claims, leader election — the stuff Kubernetes does well.
- Compliance and isolation. Per-team namespaces, fine-grained IAM, network policies, image-signing enforcement.
The platform options, in increasing complexity
| Platform | Sweet spot | Cost-of-ops |
|---|---|---|
| App Runner / Fly.io / Railway | One service, deploy from a repo, autoscaling included | Tiny — they manage everything |
| ECS Fargate | A handful of services, no servers to patch | Low — task definitions, services, ALB target groups |
| ECS on EC2 | Cost-conscious teams running many tasks per host | Medium — you manage the EC2s |
| EKS / GKE / AKS (managed K8s) | Many teams, many services, platform team in the org | High — full Kubernetes complexity, but managed control plane |
| Self-managed Kubernetes | Specific compliance or perf requirements | Very high — only attempt with platform staff |
| Nomad | K8s-tired teams; multi-region simpler | Medium — leaner than K8s but smaller ecosystem |
Container Pitfalls
- PID 1 zombie reaping. Many languages don't reap child processes when run as PID 1. Use
--init(Docker provides a tiny init that reaps), or a language-specific init liketini, or build with one in. - Logs to files. Containers expect logs on stdout/stderr; writing to
/var/log/app.logmeans logs disappear on restart. Make your app log to stdout. - Writing inside the image. The image's layers are read-only; the writable layer is per-container. Anything you need across restarts goes in a volume.
- Running as root. Default if you don't set
USER. A container escape from root inside is much worse than from a non-root UID. Always set a non-root user. - Latest tag in production. Covered above; the source of "it works on dev, fails in prod" mysteries.
- Resource limits unset. A container with no memory limit can OOM the host. Set
--memoryand--cpus(orresources.limitsin compose/k8s). - Time inside the container. Time zones, NTP — usually fine, but a host with skewed clock affects every container on it.
- SIGTERM ignored. Many languages need
execform (CMD ["node", "server.js"]) to receive signals; shell form (CMD node server.js) wraps in/bin/sh -cwhich eats SIGTERM. Always use exec form.
FROM node:20 (~1 GB) and the resulting image is 1.6 GB. Their CI pipeline takes 8 minutes to build because every code change reruns npm install. Outline three changes that together drop image size to under 200 MB and rebuild time to under 60 seconds, in priority order.Show answer
npm ci — copy package.json + package-lock.json first, run npm ci --omit=dev, then copy the rest. Dependency install is now a cached layer for any commit that doesn't change deps. Rebuild time drops from minutes to ~10 seconds for code-only changes. 2) Switch base to node:20-slim or node:20-alpine — drops the base from ~1 GB to ~250 MB or ~150 MB. 3) Use a multi-stage build: build stage compiles assets and installs all deps, runtime stage starts from a minimal base and copies only node_modules + the build output + server.js. With node:20-alpine + multi-stage + tight COPY scope, the image is ~120-180 MB. Other useful additions: a .dockerignore excluding node_modules, .git, build/; setting USER node; setting NODE_ENV=production so npm prunes dev deps automatically. The cumulative effect is a tighter image, faster CI, and a smaller attack surface.- Slim base — alpine, slim, distroless.
- Cache deps — copy manifest first, install, then copy source.
- Multi-stage — build big, ship small.
- Non-root — set USER explicitly.
- Exec form — CMD ["node","server.js"], not the shell form.
docker build on an Apple Silicon laptop produce an image that fails on x86 EC2 with "exec format error," and what's the cleanest fix in CI?docker buildx build --platform linux/amd64 -t $TAG --push . if you target x86, or build a multi-arch manifest with --platform linux/amd64,linux/arm64 so the right one is picked at pull time. Local dev on Apple silicon then runs the arm64 variant; production EC2 pulls the amd64 variant. The longer-term move is to run on Graviton (arm64) servers — they're cheaper and the local dev arch matches.- Docker — Build cache and layer reusedocs.docker.com
- Docker — Multi-stage buildsdocs.docker.com
- Distroless — minimal base imagesgithub.com
- AWS ECSaws.amazon.com
- Kubernetes conceptskubernetes.io
- Docker — compose in production guidancedocker.com
- Sigstore cosign — image signinggithub.com
Finished reading?