
Cloud Network Security — VPC, Routing & Endpoints
The cloud network stack is software-defined, but the rules of subnetting, routing, and egress control are the same as ever. Master VPC layout, NACLs vs security groups, PrivateLink and VPC endpoints, transit hubs, and the egress controls that actually catch the data exfiltration attempts.
What you will learn
Identity is the perimeter. The network is still the second wall — and the first one if your identity story has a hole. This chapter is about how a modern cloud network is laid out, what each control actually enforces, and how to design for failure modes the on-prem network never had: SSRF reaching the metadata service, lateral movement through a flat overlay, and exfiltration through a misconfigured S3 endpoint.
VPC Topology — A Subnet Is Just a Routing Decision
The vocabulary leads developers astray: a subnet labelled public is not magically reachable, and a subnet labelled private is not magically safe. What makes a subnet public is its route table — specifically, that it has a default route (0.0.0.0/0) pointing to an Internet Gateway. A subnet whose default route points to a NAT Gateway is a private with egress; a subnet with no default route at all is isolated.
The standard three-tier layout
- Public subnets hold load balancers, NAT gateways, and bastions. Default route is the Internet Gateway. Nothing else lives here.
- Private (egress) subnets hold app servers, workers, queue consumers. Default route is the NAT gateway in the public subnet of the same AZ. Returning traffic from outbound calls works; inbound from the internet does not.
- Isolated subnets hold databases, caches, internal-only services. No default route. Egress to managed services happens through VPC endpoints; cross-AZ replicas in matching subnets.
- One of each per AZ — three AZs is the typical compromise between cost and zonal-failure resilience.
10.0.0.0/8 divided per region per environment), even if you start with one VPC. Future-you will not be sorry.Security Groups and NACLs — Two Different Tools
Both filter packets, but they live at different layers and obey different rules. Use both, but for different reasons.
| Security Group | Network ACL | |
|---|---|---|
| Layer | Around an ENI / instance / endpoint | Around a subnet |
| State | Stateful — return traffic auto-allowed | Stateless — must allow both directions |
| Effect types | Allow only (implicit deny) | Allow and Deny (numbered rules) |
| Reference targets | Other SG IDs (security-group-as-source!) | CIDR only |
| Default | Deny in / allow all out (start by removing the all-out) | Allow in & out |
| Best at | Workload-to-workload micro-segmentation | Coarse subnet guardrails, blocking known-bad CIDRs |
Security-group-as-source — the underused power tool
Hard-coding IP ranges in security groups is a maintenance nightmare in autoscaling. Instead, allow source by security group ID: "the database SG accepts 5432 from the app-tier SG." Any ENI that joins the app SG is automatically allowed; any ENI removed from it loses access. This makes SG rules identity-based at the network layer, and is the closest you get to micro-segmentation without a service mesh.
resource "aws_security_group" "app" {
name = "app-tier"
vpc_id = aws_vpc.main.id
}
resource "aws_security_group" "db" {
name = "db-tier"
vpc_id = aws_vpc.main.id
}
# DB accepts Postgres only from anything in the app SG
resource "aws_vpc_security_group_ingress_rule" "db_from_app" {
security_group_id = aws_security_group.db.id
referenced_security_group_id = aws_security_group.app.id
ip_protocol = "tcp"
from_port = 5432
to_port = 5432
}attacker.com on 443. Service-mesh teams can do this at L7; for plain VMs, SG egress is your best bet.VPC Endpoints — PrivateLink Is the Shape of Modern Connectivity
Two kinds of VPC endpoint exist; they are not interchangeable.
Gateway endpoints
Used for S3 and DynamoDB. They install a route into your route table targeting com.amazonaws.region.s3 as a special prefix list. Traffic to S3 from the VPC stays inside the AWS network, never traverses the IGW or NAT. Free.
Interface endpoints (PrivateLink)
Used for everything else — KMS, STS, Secrets Manager, ECR, SSM, partner APIs, your own services. An ENI is created in your VPC, and Route 53 private DNS overrides the public service hostname to resolve to that ENI. Your code keeps using kms.us-east-1.amazonaws.com; the resolution is private. Pay per endpoint per AZ per hour, plus data.
aws:SourceVpce in resource policies to pin S3 buckets to your private network only.Endpoint policies — the often-missed control
Each VPC endpoint can carry its own policy that filters which API calls can use it. A common hardening: a corporate S3 endpoint policy that denies s3:GetObject on any bucket whose aws:ResourceOrgID is not your org. Now even if a compromised principal has credentials with broad S3 permissions, attempts to exfiltrate to an attacker bucket via your private network simply fail. This is one of the most powerful exfiltration controls in AWS.
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject", "s3:ListBucket", "s3:PutObject"],
"Resource": "*",
"Condition": {
"StringEquals": { "aws:ResourceOrgID": "o-1234567890" }
}
}]
}Cross-account PrivateLink for internal services
You can also publish your own service via PrivateLink: an NLB in front of your service in account A, exposed as a service-name. Account B creates an interface endpoint to it, optionally accepted by you. Internal-SaaS-style architectures (Snowflake, Databricks, internal platforms) all use this. The receiving service sees the endpoint ENI's source IP — meaning IP-based controls work, but you must use endpoint-policy or TLS-cert pinning for identity.
Egress Control — Where Real Attackers Get Caught
Inbound is the obvious surface; in practice, post-compromise data movement happens outbound. The goal of cloud egress controls is to make a successful compromise visible and useful exfiltration paths few. Three controls in increasing strength.
NAT plus FQDN allow-list
For workloads that legitimately call a small set of external APIs, a forward proxy with FQDN allow-listing (Squid, AWS Network Firewall with Suricata, GCP Secure Web Proxy) is the canonical answer. Egress goes through the proxy; the proxy enforces "only api.stripe.com, api.github.com". Random calls to attacker.com are rejected and logged. Pair with TLS inspection only if your data classification justifies it; otherwise SNI-only filtering catches most exfil while preserving certificate trust.
DNS firewalls — Route 53 Resolver DNS Firewall
Block resolution of malicious or unknown domains at the resolver. Route 53 Resolver DNS Firewall accepts managed lists (AWS-managed, abuse.ch, etc.) and custom lists. Cheap, broad, and catches a lot of commodity attacker tooling that uses DGAs (domain generation algorithms). Does not catch IP-direct callouts; combine with egress firewalls.
Block direct internet egress
The strongest move: only managed services through PrivateLink, never raw internet. Workloads route nowhere outbound except VPC endpoints; package mirrors live in private artifact registries. This is increasingly common in regulated environments and feasible in others if you commit to a private artifact pipeline.
metadata.google.internal, Azure IMDS — sits at a well-known link-local IP. An SSRF in your app or a misconfigured proxy turns into credential theft: the attacker reads the role's STS credentials right out of the metadata response. Mitigations: require IMDSv2 (token + hop-limit 1) on AWS; on GCP, add --metadata=disable-legacy-endpoints=true; on Azure, scope IMDS reads with NSGs and managed-identity hardening. The next chapter (Service Mesh / Zero Trust) treats this as the canonical SSRF case.Multi-VPC: Peering, Transit Gateway, and Cloud WAN
Real estates have more than one VPC. The connectivity options progress in scale and cost.
- VPC Peering. 1:1 link, no transitive routing. Fine for two VPCs; quadratic mess at ten.
- Transit Gateway (AWS) / Network Connectivity Center (GCP) / Virtual WAN (Azure). A hub-and-spoke router. Each VPC attaches once; routing tables on the gateway segment which spokes can reach which. This is the default for any non-trivial estate.
- Cloud WAN. A higher-level abstraction layered over Transit Gateways across regions, with policy-driven attachments. Useful past about ten regions.
- VPC sharing (RAM in AWS, Shared VPC in GCP, virtual networks in Azure). One networking account owns the VPC; workload accounts attach. Centralises networking ops.
prod, nonprod, and shared-services, with explicit propagations only where needed. A misconfigured spoke cannot route into another segment. Combined with account-level separation it is the cloud equivalent of network zones — and CloudWatch flow logs at the TGW make cross-VPC egress observable in one place.VPC Flow Logs — Your Network Audit Track
Every accept/reject decision in the VPC can be logged. The fields cover source/destination IP, ports, packet/byte counts, action, and (in v5) the AWS service the traffic is heading to. Enable on every VPC, ship to S3 with object-lock and a 90+ day retention, and parse with Athena or a SIEM. Patterns to alert on:
- REJECTs from inside a private subnet — workload trying to reach the internet, often early-stage exfil reconnaissance.
- Spike in destination IPs from one ENI — possible scanning or staged exfil.
- Egress to a low-reputation ASN — usually paired with a DNS firewall.
- IMDS access from a process other than the metadata library — pair with
aws:CalledViapatterns in CloudTrail.
Show answer
The Cross-Provider View
The vocabulary varies; the architecture survives.
| Concept | AWS | GCP | Azure |
|---|---|---|---|
| L4 firewall around workload | Security Group | Firewall rule (network tag / SA) | NSG (subnet/NIC) |
| L4 firewall around subnet | NACL | (via firewall rule order) | NSG on subnet |
| Private path to managed service | VPC Endpoint (Gateway / Interface) | Private Service Connect / Private Google Access | Private Endpoint / Service Endpoint |
| Hub for many networks | Transit Gateway / Cloud WAN | Network Connectivity Center | Virtual WAN / vNet peering |
| FQDN egress firewall | Network Firewall | Secure Web Proxy / Cloud NGFW | Azure Firewall |
| Service-private exposure | PrivateLink (NLB-fronted) | Private Service Connect | Private Link Service |
| Network observation logs | VPC Flow Logs | VPC Flow Logs | NSG Flow Logs |
- Three subnet tiers per AZ.
- Egress is the exfil surface — proxy or block it.
- Endpoints over internet for managed services.
- Flow logs into a SIEM, retained outside the account.
A Cloud-Native Reference Pattern
Pulling it all together for a typical web service:
- Public ALB in public subnets, terminating TLS with an ACM cert.
- App tier in private subnets, running Fargate tasks or EC2 ASGs, security-group-as-source from the ALB SG.
- RDS Aurora in isolated subnets, with KMS at-rest encryption and TLS in-transit.
- Interface endpoints for KMS, Secrets Manager, STS, ECR, CloudWatch Logs, plus gateway endpoints for S3 and DynamoDB.
- NAT gateway present but its egress filtered by AWS Network Firewall — only
api.stripe.com,api.sendgrid.com, and your own SaaS allow-list. - Route 53 Resolver DNS Firewall blocking the AWS-managed domain block list.
- VPC flow logs to S3 in a separate logging account, retained 13 months.
- SCP at the org pinning region and forbidding internet-routable services in non-public OUs (
ec2:AssociatePublicIpAddressdenied).
0.0.0.0/0. Why is this still bad even with a strong password?aws:SourceVpce / org pinning. 4) Egress, DNS and IMDS are where post-compromise activity lives — proxy, block or filter all three. 5) Flow logs in a separate account are non-negotiable; you cannot investigate without them.- AWS — PrivateLink conceptsaws.amazon.com
- AWS — Network ACLsaws.amazon.com
- AWS Network Firewall — Suricata-based egress filteringaws.amazon.com
- GCP — Private Service Connectcloud.google.com
- Azure — Private Endpoint overviewmicrosoft.com
- NIST SP 800-41 — Guidelines on Firewallsnist.gov
- MITRE ATT&CK T1552.005 — Cloud Instance Metadata APImitre.org
Finished reading?