DNS & The Routing of a Request
Before your code runs in production, a request has to find it. That hunt starts with DNS — the planet-scale phonebook that turns 'api.acme.com' into an IP address. Learn the hierarchy, the record types you'll actually use, why TTLs are the most important number in deployment, and how a five-line typo in a zone file can break the internet for your customers.
What you will learn
Deployment begins one layer above your code: at the address. Before a single byte of your service runs, somebody's browser had to find it. That finding is DNS — the Domain Name System — and for most engineers it sits in a box marked 'someone else's problem' until the day it isn't. The day a domain doesn't resolve, or resolves to the wrong place, or resolves correctly for you but not for half your users, is the day DNS becomes the most important system in your stack. This chapter installs the model so that day is short.
What Actually Happens When You Type a URL
You type api.acme.com and hit Enter. Before any HTTP, before any TLS, the browser does not know where to send the packet. It needs an IP address. The work it does to find one is DNS resolution, and stripped of its lore, it's four named caches in a row.
Step by step, the first time
- Browser cache. Chrome holds a small DNS cache for ~1 minute.
chrome://net-internals/#dnsreveals it. - OS resolver. The kernel (via systemd-resolved, mDNSResponder, or similar) caches results from prior processes on the same machine.
scutil --dnson macOS,resolvectl statuson Linux. - Recursive resolver. Configured by DHCP or manually —
1.1.1.1(Cloudflare),8.8.8.8(Google), or your ISP. Holds the largest cache because thousands of clients share it. - Root → TLD → authoritative. If nothing is cached, the recursive resolver walks the hierarchy: ask a root server who handles
.com, ask the.comTLD server who handlesacme.com, then ask the authoritative server for the actual record. - Cache on the way back. Every layer caches the answer for the record's TTL.
The first query for a brand-new domain might take 80–150 ms. The millionth query, served entirely from the recursive resolver's cache, takes 1 ms. The cache is the design.
The Hierarchy — and Why It's Shaped Like This
Read a domain right-to-left, and you read its administrative tree:
api . acme . com . ↑ ↑ ↑ ↑ host SLD TLD root (implicit dot)
The trailing dot is real but rarely typed. It marks the root zone — the topmost authority. Underneath the root sit ~1500 top-level domains: legacy (.com, .net, .org), country-code (.uk, .de, .in), and the modern long tail (.io, .dev, .app, .ai). Each TLD is run by a registry (Verisign for .com, Public Interest Registry for .org) which delegates names to registrars (Namecheap, Cloudflare Registrar, Route53), who sell them to you.
Your acme.com is one entry in the .com registry. You set NS records with the registrar — those records point at your authoritative nameservers (Cloudflare, Route53, NS1, your own bind installation). From that point onward, your nameserver is the source of truth for everything under acme.com.
dig +trace acme.com.The Records You'll Actually Use
DNS is a record store. Each record has a name, a type, a TTL, and data. You'll meet maybe 30 record types over a career; in deployment you'll touch six.
| Type | Purpose | Example |
|---|---|---|
| A | Hostname → IPv4 address | api.acme.com 300 A 203.0.113.42 |
| AAAA | Hostname → IPv6 address | api.acme.com 300 AAAA 2001:db8::42 |
| CNAME | Alias for another hostname | www.acme.com 300 CNAME acme.com. |
| MX | Mail exchange | acme.com 3600 MX 10 aspmx.l.google.com. |
| TXT | Free-form text (SPF, DKIM, verification) | acme.com 300 TXT "v=spf1 include:_spf.google.com ~all" |
| NS | Delegate a zone to nameservers | acme.com 86400 NS ns1.cloudflare.com. |
| SRV | Service location with port (XMPP, SIP, MS services) | _sip._tcp.acme.com 3600 SRV 10 60 5060 sip.acme.com. |
| CAA | Which CAs may issue certs for this name | acme.com 86400 CAA 0 issue "letsencrypt.org" |
| PTR | Reverse: IP → name (mostly for mail) | 42.113.0.203.in-addr.arpa PTR mail.acme.com. |
The CNAME trap
A CNAME says "to find www.acme.com, look up acme.com." Two rules govern it and both surprise people:
- A name with a CNAME may not have any other records. Not an MX, not a TXT, nothing. The CNAME is the whole answer for that name.
- The zone apex (the bare domain like
acme.com) cannot be a CNAME. The apex usually carries SOA, NS, MX, TXT — so a CNAME there is illegal per RFC.
This is why nearly every modern DNS provider invented a non-standard ALIAS or ANAME record (Route53 calls it an alias record) — it lets the apex point at a hostname like a load balancer's elb-1234.us-east-1.elb.amazonaws.com, with the provider doing the second lookup at query time and returning the resolved IPs. If you've ever wondered why pointing acme.com straight at an AWS ALB requires Route53, it's this.
TTL — The Most Important Number You Set
Every record has a TTL — Time To Live, in seconds — which tells caches how long to keep the answer. It's a number, but it's a policy: how quickly can I change my mind?
TTL playbook
- Stable production records: 1 hour to 24 hours (3600–86400). Cheap, snappy.
- Records you'll change soon (migration, IP rotation, blue-green): drop to 60–300 seconds at least one full TTL window before the change. The old TTL must expire before the new TTL takes effect — or some caches still hold a 24-hour answer.
- Email/SPF/DKIM: 1 hour. You don't want a flaky DKIM rotation breaking deliverability for a day.
- NS records: long (24–48 hours). They almost never change; long TTLs reduce hit rate at the parent TLD.
Propagation Is a Lie — Cache Expiration Is the Truth
People say "DNS is propagating" as if records flood outward like water. They don't. DNS is pull, not push. Each cache holds whatever it cached at the moment it asked, until that record's TTL expires and it asks again.
This means "propagation" depends entirely on (1) which TTL was active when each cache last asked, and (2) when that cache last asked. There's no atomic switch; for the duration of the longest cached TTL, some users will see the old answer and some the new one. That's not a bug in DNS — it's the design.
Tools for diagnosing
# Ask the authoritative server directly (bypass caches) dig api.acme.com @ns1.cloudflare.com # Trace the full delegation chain (root → tld → authoritative) dig +trace api.acme.com # Check what the world sees, from many resolvers curl https://dns.google/resolve?name=api.acme.com\&type=A https://www.whatsmydns.net/#A/api.acme.com # browser tool # Show your local OS resolver cache (linux) resolvectl query api.acme.com # Force-flush local caches sudo killall -HUP mDNSResponder # macOS sudo systemd-resolve --flush-caches # systemd-resolved ipconfig /flushdns # windows
The single most useful command on the list is dig +trace. It walks every step of the hierarchy and shows you exactly where the answer comes from — invaluable when a customer says "your site is down" and yours isn't.
How a Domain Becomes a Live Service
Putting it together: from buying a domain to getting traffic on day one looks like this.
- Register the domain at a registrar (Cloudflare, Namecheap, Route53). The registrar publishes your domain's existence in the TLD registry.
- Choose a DNS provider (often the same company; sometimes split — register at one, host DNS at another). Get the four-or-so authoritative nameserver hostnames they assign you.
- At the registrar, set the NS records for your domain to those hostnames. This delegation can take an hour or two to settle at the TLD.
- At the DNS provider, create your records. A or AAAA pointing to your server (or ALB, or Cloudflare proxy). MX for email, CAA to lock down certificate issuance, TXT for verification.
- Verify with
dig +trace. You should see your authoritative nameservers responding with the records you set. - Point your service at it — configure your server to answer for that hostname, request a TLS cert (Day 4), serve traffic.
Smart DNS — Health Checks, GeoDNS, Failover
Once a service is global, plain A records aren't enough. Modern DNS providers extend the protocol with answer-time logic, all of which still resolves to one of those six record types but with a brain behind the choice.
- Health checks. The DNS provider periodically pings your endpoint; if it's down, it stops returning that IP in answers. Cloudflare load balancers, Route53 health checks, NS1 monitors all work this way.
- GeoDNS. Return different IPs based on where the resolver is on the planet. A user in Frankfurt gets the EU IP; a user in São Paulo gets the SA IP.
- Latency-based routing. Return the IP whose datacenter has the lowest RTT to the resolver.
- Weighted routing. Split traffic 90/10 between two backends — the textbook way to canary a new release at the DNS layer.
- Failover. Primary IP normally; if the health check fails, swap to the secondary.
The Records That Aren't About Routing
Three classes of records exist purely to prove things to the rest of the internet — and they bite hard when wrong.
Email auth — SPF, DKIM, DMARC
If your domain sends email, three TXT records keep it out of spam folders.
- SPF: a TXT at the apex listing which servers may send mail as your domain.
v=spf1 include:_spf.google.com ~all. - DKIM: a TXT at
selector._domainkey.acme.comholding the public key your mail provider signs outbound mail with. - DMARC: a TXT at
_dmarc.acme.comtelling receivers what to do with mail that fails SPF/DKIM.v=DMARC1; p=quarantine; rua=mailto:postmaster@acme.com.
Domain ownership verification
SaaS tools (Google Workspace, GitHub Pages, Stripe) often ask you to add a TXT record with a unique token before they'll trust you with the domain. The pattern: they generate a token, you publish it, they re-query, see it, mark you verified.
CAA — who's allowed to issue certificates
CAA records tell certificate authorities which CAs you authorize to issue TLS certs for your domain. Without it, any CA in the world's trust store can theoretically issue a cert for acme.com if they're tricked into believing it. With it, only the CAs you list will succeed.
acme.com. 86400 CAA 0 issue "letsencrypt.org" acme.com. 86400 CAA 0 issue "amazon.com" acme.com. 86400 CAA 0 issuewild ";" ; no wildcards acme.com. 86400 CAA 0 iodef "mailto:secops@acme.com"
Things That Will Bite You
The DNS war stories that show up in postmortems are mostly the same five mistakes:
- Forgetting to lower TTLs before a migration. Half your users see the old IP for 24 hours. (Covered above.)
- NS records out of sync. You moved DNS providers but didn't update NS at the registrar. The world still queries the old one.
- CNAME at the apex. You set a CNAME on
acme.com; some resolvers tolerate it, others return SERVFAIL. Use ALIAS/ANAME or A. - Missing reverse DNS for outgoing email. Your IP has no PTR pointing to your sender domain; receivers reject the mail.
- Recursive resolver outages. A bad config push at a major resolver (it has happened to 1.1.1.1, 8.8.8.8, ISPs) breaks half the internet's view of your domain — and there's nothing you can do but wait. Multiple authoritative nameservers across providers is the defense; multiple records pointing at the same provider doesn't help if the provider is down.
api.acme.com from one server to another at 09:00 tomorrow. The current TTL is 3600 (1 hour). What do you do today, and what do you do tomorrow?Show answer
Where DNS Sits in the Stack You're Building
The rest of this course is about the things DNS points to: the EC2 box (Day 2), the NGINX in front (Day 3), the TLS cert that proves you own the name (Day 4), the deploy process that swaps what's behind the IP (Day 5), the container the process runs in (Day 6), and the pipeline that automates all of it (Day 7). DNS is the entry point; treat it carefully and the rest gets easier. Treat it carelessly and you'll spend a lot of nights with dig +trace.
- Hierarchy — root → TLD → your zone.
- Records — A, AAAA, CNAME, MX, TXT, NS, CAA.
- TTL — controls how fast you can change your mind.
- Cache — every step of the resolution path holds one.
- Authority — your nameserver is the truth; the registrar's NS records say so.
acme.com (the apex), and what do you use instead when you want the apex to point at a hostname like an AWS load balancer?dig +trace when something feels wrong — it answers nine out of ten DNS mysteries.- RFC 1034 — Domain names, concepts and facilitiesdatatracker.ietf.org
- Cloudflare — DNS learning centercloudflare.com
- How DNS Works — illustrated comichowdns.works
- AWS Route 53 developer guidedocs.aws.amazon.com
- ICANN — Root server operatorsicann.org
- RFC 8659 — DNS Certification Authority Authorization (CAA)datatracker.ietf.org
Finished reading?