TLS & Certificate Lifecycle — Padlocks That Don't Expire on Christmas
TLS turned the web from a plaintext gossip channel into a private one. Today every browser shows users a giant warning if your site doesn't have it — and your certificate has a 90-day expiry counting down whether you remember or not. Master the handshake, the chain of trust, Let's Encrypt and ACME, automatic renewal, HSTS, OCSP stapling, and the boring discipline that keeps the padlock green at 3 AM on a holiday.
What you will learn
TLS is the layer that lets you give a credit card to a website without the coffee shop on the corner reading it. It's also the layer most likely to wake you up on a holiday: certs expire, ACME challenges fail, intermediate CAs change, browsers tighten their requirements every year. The good news: the operational work is mostly automatable, and a handful of patterns turn TLS from a chore into a thing that just works. The path here is to first understand the handshake and the chain of trust, then learn the Let's Encrypt + certbot loop that has become the default, then layer on the production niceties: HSTS, OCSP stapling, modern cipher selection, and the renewal monitoring that catches the one box that forgot.
What TLS Actually Does
Three properties, in order of how often they're broken in a careless deployment:
- Encryption — bytes on the wire are unreadable to anyone but the two endpoints.
- Integrity — bytes can't be modified in flight without detection.
- Authentication of the server — the client knows it's actually talking to
acme.com, not someone pretending. (Mutual TLS adds client authentication too.)
Encryption alone is parlour-trick simple — a static shared key would do it. The hard part is the third bullet: how does your browser, having never met acme.com before, verify on the first byte that it's the real one? The answer is the chain of trust, and TLS spends most of its complexity here.
The Handshake — Five Steps in TLS 1.3
TLS 1.3 (RFC 8446, 2018) collapsed the older 2-RTT handshake into one. Here's what happens before any HTTP byte flows:
The crucial mechanism is ephemeral key exchange. Both sides contribute random key shares; they combine them (Diffie-Hellman) to derive a session secret nobody but those two parties can compute, even if they later steal the server's long-term key. This property — forward secrecy — means past traffic can't be decrypted by future key compromise. TLS 1.3 makes this mandatory.
SNI — many sites on one IP
Step 1 includes the Server Name Indication (SNI) extension: the hostname the client is trying to reach, sent before the server picks a certificate. Without SNI, every TLS host needed its own IP. With SNI, NGINX can serve acme.com and example.org on the same IP, picking the right cert based on the SNI value. SNI is unencrypted in TLS 1.3 by default; Encrypted Client Hello (ECH) is the proposed fix and is rolling out gradually.
The Chain of Trust
Authentication hinges on a public-key infrastructure (PKI). Your browser ships with a list of ~140 root CAs it trusts (the trust store, baked into the OS or browser). A certificate signed by one of those — directly or transitively — is trusted; one that isn't, isn't.
What the browser actually checks
- Signature chain. Leaf signed by intermediate; intermediate signed by a root the browser trusts.
- Validity dates. Each cert in the chain is currently in its not before / not after window.
- Hostname match. The hostname in the URL matches one of the leaf's
subjectAltNameentries (theCNhasn't been the source of truth in browsers since ~2017). - Revocation. The cert hasn't been revoked (via OCSP or CRLs — see below).
- CT logs. The cert appears in public Certificate Transparency logs (Chrome enforces this since 2018).
fullchain.pem (leaf + intermediates) as ssl_certificate, never just the leaf. Test publicly with ssllabs.com — "Chain issues: incomplete" is the giveaway.ACME and Let's Encrypt
Before 2016, getting a TLS cert involved a credit card, a CSR you didn't fully understand, and a phone call. Then Let's Encrypt and the ACME protocol made it free, automated, and 90 days at a time. Today, ~half the public internet's certs come through ACME.
How ACME works
- You generate a key pair for your account; ACME uses the public key as your identity.
- You ask for a cert for a list of names (
acme.com,www.acme.com). - The CA challenges you to prove you control each name. Three challenge types:
- HTTP-01: serve a specific token at
http://acme.com/.well-known/acme-challenge/<token>. The CA fetches it. The default for web servers. - DNS-01: publish a TXT record at
_acme-challenge.acme.comwith a specific value. Required for wildcards. Works behind any firewall. - TLS-ALPN-01: respond to a special TLS handshake on port 443 with the token. Useful when you can't intercept HTTP-01 paths.
- HTTP-01: serve a specific token at
- You publish the proof at the location/value the CA specified.
- The CA verifies, then issues the cert, signed by their intermediate.
- You install the cert and restart your server.
certbot — the canonical client
certbot is the EFF-maintained ACME client; it does all of the above and ships with hooks for popular web servers.
sudo apt install certbot python3-certbot-nginx
# Interactive: detects nginx server blocks, modifies them, reloads
sudo certbot --nginx -d acme.com -d www.acme.com
# Or, manage cert + nginx config separately (cleaner for IaC)
sudo certbot certonly --webroot -w /var/www/html -d acme.com -d www.acme.com
# Cert lands at /etc/letsencrypt/live/acme.com/{fullchain,privkey,chain,cert}.pem
# Wildcard (*.acme.com) requires DNS-01
sudo certbot certonly --dns-cloudflare \
--dns-cloudflare-credentials ~/.secrets/cloudflare.ini \
-d acme.com -d '*.acme.com'certbot installs a systemd timer (systemctl status certbot.timer) that runs certbot renew twice a day. It only actually renews certs in the last 30 days of validity, so you get plenty of room to catch failures before they bite.
Wiring TLS Into NGINX
The Day 3 starter config already references certbot's output. The settings worth knowing:
server {
listen 443 ssl http2;
server_name acme.com;
# Cert + chain (fullchain.pem = leaf + intermediates)
ssl_certificate /etc/letsencrypt/live/acme.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/acme.com/privkey.pem;
# Protocols: drop TLS 1.0/1.1 (deprecated since 2020)
ssl_protocols TLSv1.2 TLSv1.3;
# Cipher suites — Mozilla "intermediate" profile, copy/paste
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305;
ssl_prefer_server_ciphers off; # honour client preference (better with mobile)
# Session reuse — saves the handshake on repeat visits
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off; # no reuse across server restarts (forward secrecy)
# OCSP stapling — server fetches revocation status for the client
ssl_stapling on;
ssl_stapling_verify on;
resolver 1.1.1.1 8.8.8.8 valid=300s;
resolver_timeout 5s;
# HTTP Strict Transport Security — "never come back over http"
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
# The rest of your config ↓
}
For the cipher list specifically, don't roll your own. Use ssl-config.mozilla.org — it generates a config matching Mozilla's three profiles (modern, intermediate, old) and is updated as TLS evolves.
HSTS — The Browser Lock
HSTS (HTTP Strict Transport Security) tells the browser: "never visit this site over plain HTTP again — for the next N seconds, upgrade everything to HTTPS automatically." Once a browser has received the header from your domain, even a typed-in http://acme.com goes over HTTPS without an opportunity for an attacker to downgrade.
- max-age: usually 1 year (
31536000) or 2 years (63072000). - includeSubDomains: applies the policy to
foo.acme.comas well. Don't enable this until you're sure every subdomain has TLS. - preload: opt in to the browser's hardcoded HSTS preload list. Once on the list, browsers never even ask — they refuse plain HTTP from the first visit. Submit at hstspreload.org. Hard to undo; commit only when you're sure.
max-age to 0 for a few minutes during an experiment, and a year-long policy embeds itself in every visitor's browser. The browser ignores anything about your site for the rest of the year except via HTTPS. If you can't keep TLS up, you can't temporarily disable HSTS for users who already cached it. Roll out with a small max-age first (a day, then a week, then a year), and only enable preload when you're committed to HTTPS forever.OCSP Stapling — Revocation Without the Round-Trip
If a private key is stolen, you revoke the cert. The CA publishes the revocation; clients learn about it via OCSP (Online Certificate Status Protocol) or CRLs (Certificate Revocation Lists). Both have problems: OCSP queries leak which sites you visit to the CA; CRLs are large and slow to refresh.
OCSP stapling fixes this. The server periodically fetches the OCSP response from the CA, signs it, and "staples" it to the TLS handshake — the client gets revocation status without contacting the CA itself. Cheaper, more private, faster.
echo | openssl s_client -connect acme.com:443 -status 2>/dev/null \ | grep -E 'OCSP response|Cert Status' # Look for: "OCSP Response Status: successful" + "Cert Status: good"
If stapling silently fails — usually because of an outdated chain.pem file or a blocked outbound connection from your server to the OCSP responder — TLS still works, but your server is making the client do the revocation lookup. Watch for it in error.log: OCSP responder query failed.
CAA Records — Bound to Day 1
Day 1 introduced CAA records — DNS records that name the CAs allowed to issue certs for your domain. Set them now, before you have an incident:
acme.com. 86400 CAA 0 issue "letsencrypt.org" acme.com. 86400 CAA 0 issuewild ";" ; explicitly forbid wildcards acme.com. 86400 CAA 0 iodef "mailto:secops@acme.com"
An attacker who somehow proves control of your DNS is the only path to issuing a rogue cert; the CAA record means even a misbehaving CA can't help them. The cost is one DNS edit forever.
RSA vs ECDSA — The 2-Key Setup
Modern Let's Encrypt issues both RSA and ECDSA leaf certs. ECDSA is faster, smaller, and supported by 99%+ of clients in the wild. RSA is still the universal fallback. The current best practice on a busy server is to serve both — NGINX picks based on the client's capability:
ssl_certificate /etc/letsencrypt/live/acme.com-ecdsa/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/acme.com-ecdsa/privkey.pem; ssl_certificate /etc/letsencrypt/live/acme.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/acme.com/privkey.pem;
Get the second cert with certbot --key-type ecdsa. Most clients now negotiate ECDSA, which trims handshake CPU substantially under load.
The Renewal Discipline
Certs expire. Browsers refuse expired certs. Every TLS outage post-mortem has the same shape: "the renewal job had been failing for 60 days but nobody noticed." Three layers of defence:
- Automated renewal — certbot's timer or a CI job. Run twice a day; the operation is idempotent.
- Renewal hooks — auto-reload NGINX when a cert is renewed:
bash — /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh
#!/usr/bin/env bash set -euo pipefail systemctl reload nginx
- External monitoring. Don't trust the renewal job to tell you it's broken — it might not run at all. Probe from outside:
bash — days remaining on the live cert
echo | openssl s_client -servername acme.com -connect acme.com:443 2>/dev/null \ | openssl x509 -noout -dates # Wire that into a Prometheus exporter or a UptimeRobot SSL check; alert at 14 days.
mTLS — When the Client Has a Cert Too
Mutual TLS extends the handshake: the server requests a cert from the client, and the connection only succeeds if the client presents a valid one signed by an authority the server trusts. Use cases:
- Service-to-service auth in a service mesh (Istio, Linkerd, Cloudflare Access) — every internal call presents a cert.
- IoT and partner APIs — your customers install a client cert; only those certs can call your API.
- Bastion or admin endpoints — stronger than passwords, less hassle than VPN for small teams.
server {
listen 443 ssl;
ssl_certificate /etc/letsencrypt/live/api.acme.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.acme.com/privkey.pem;
ssl_client_certificate /etc/nginx/ssl/partner-ca.pem; # CA that signs partner clients
ssl_verify_client on; # require, fail otherwise
ssl_verify_depth 2;
location /webhook/ {
proxy_set_header X-Client-DN $ssl_client_s_dn;
proxy_pass http://app_pool;
}
}
The DN of the verified client is exposed as $ssl_client_s_dn; pass it to your app as a header for authorization decisions. This pattern replaces shared API tokens for high-trust integrations.
What Goes Wrong (and How You Notice)
- Cert expired. All clients see warnings. Cause: renewal job has been failing silently. Defence: external monitor (cert expiry alerts at 14 days).
- Chain incomplete. Some clients fine, others see warnings. Cause:
ssl_certificatepoints atcert.peminstead offullchain.pem. Test with ssllabs.com. - Hostname mismatch. User typed
www.acme.comand your cert is only foracme.com. Defence: include both names; redirect www→apex (or vice versa). - HSTS sticky after a misstep. A misconfigured server sent HSTS for the wrong hostname; users can't reach the site over plain HTTP. There's no fix on the server side; users have to clear HSTS state in their browser. Roll HSTS out gradually.
- Port 80 blocked. ACME HTTP-01 challenge fails because the server can't reach port 80, which is required for renewal. Either keep port 80 open (redirect to HTTPS), or use DNS-01.
- Mixed content. Page loads over HTTPS but pulls a script over HTTP — browsers block. Cause: hardcoded
http://URLs in templates. Fix at the source; serve everything ashttps://or relative. - Cipher mismatch. An ancient client (a payment terminal, an old phone) can't negotiate. Mozilla "intermediate" config is the right balance for 2026; "modern" rules out enough phones to be cautious in consumer-facing apps.
certbot renew --dry-run exits 0. What are the two most likely causes, and how do you fix each?Show answer
certbot renew succeeded weeks ago, but the deploy hook script either didn't exist or had a permission problem. Check /etc/letsencrypt/renewal-hooks/deploy/; verify sudo systemctl reload nginx is wired in; reload manually now. 2) The cert is renewed but a CDN/upstream is caching the old one. If CloudFront or an ALB is in front of NGINX, it has its own cert lifecycle — Cloudflare/ACM is independent and may not be hooked into your renewal. Pull the cert at the edge: echo | openssl s_client -connect acme.com:443 -servername acme.com 2>/dev/null | openssl x509 -noout -dates from outside your network and look at the actual expiry. The fix depends on which layer is stale. The lesson: monitor cert expiry from outside your perimeter, not by trusting the renewal job's exit code.- Handshake — one RTT in TLS 1.3, ephemeral keys for forward secrecy.
- Chain — leaf + intermediate sent; root in the trust store.
- ACME — automate via certbot, HTTP-01 for hosts with port 80, DNS-01 for wildcards.
- Renew — twice-daily timer, deploy hook reloads nginx.
- Monitor — externally, days-remaining alert at 14.
- Let's Encrypt — How it worksletsencrypt.org
- RFC 8446 — TLS 1.3datatracker.ietf.org
- Mozilla SSL Configuration Generatorssl-config.mozilla.org
- Qualys SSL Labs — server testssllabs.com
- HSTS preload list submissionhstspreload.org
- certbot user guideeff-certbot.readthedocs.io
- smallstep — Everything you should know about certificates and PKIsmallstep.com
Finished reading?