
Data Protection, KMS & Secrets Management
Encryption is one of the most-claimed and least-understood controls. Learn the envelope-encryption pattern, KMS and HSM internals, key hierarchies, BYOK/HYOK, secrets stores and rotation patterns — and why your real risk is access to the key, not the cipher.
What you will learn
If an attacker reaches your data tier, what stops them is not encryption — it is whether they can also use the key. Modern cloud encryption is operationally cheap and on by default. The interesting design questions live in the key plane: who can decrypt, under what conditions, with what audit trail, and what happens when a key is compromised. This chapter is about the engineering of that key plane.
Envelope Encryption — The Pattern Behind All Of It
Every cloud KMS implements one core pattern. Don't encrypt the data with the master key directly — that key would have to handle huge volumes and any compromise loses everything. Instead, generate a fresh data encryption key (DEK) per object, encrypt the data locally with the DEK, then ask KMS to encrypt the DEK under a long-lived key encryption key (KEK / CMK). Store the encrypted DEK alongside the ciphertext. To read: ask KMS to decrypt the DEK, then decrypt the data locally.
Why this matters operationally
- Throughput. KMS only encrypts a 32-byte DEK — bulk data goes through your local AES-NI at line rate.
- Audit granularity. Each
Decryptcall to KMS is logged in CloudTrail with the caller, so you can attribute every key use back to a principal — far cheaper than logging every read of every record. - Rotation. Rotating the KEK does not require re-encrypting all the data. Old DEKs are still decryptable under the old KEK version; new writes use the new version. Re-encrypt-on-read or background re-wrap close the gap when needed.
Key Hierarchies and HSMs
Every well-designed system has a small number of root keys protecting a larger number of operational keys. Cloud KMS instances are themselves backed by HSMs (Hardware Security Modules) — FIPS 140-2 / 140-3 Level 3 devices where the key material is generated and never exits the boundary in plaintext.
HSM modes — managed vs dedicated
- Multi-tenant managed KMS — AWS KMS, GCP Cloud KMS, Azure Key Vault standard. Backed by HSMs you don't see; lowest cost; shared across customers within strict isolation.
- Single-tenant managed — AWS KMS Custom Key Stores backed by CloudHSM; Azure Key Vault Managed HSM; GCP Cloud HSM. Dedicated HSM partition, FIPS 140-2 L3, accessed via the same KMS API.
- Self-managed CloudHSM — bare HSM cluster, you manage users and authn. Maximum control; you also operate the partition yourself. Used when key material must never touch a multi-tenant tier.
Multi-region keys
AWS multi-region keys, GCP multi-region keys, and Azure cross-region replication all let the same logical key exist in multiple regions, sharing the same key material via secure replication. Useful for active-active databases and DR — without it, every encrypted blob is locked to one region.
Access Control on Keys
Encryption is theatre if anyone with API access can decrypt. Three controls compose:
Key policy
Each KMS key carries a resource policy specifying who may use it for what. The default key policy for a customer-managed key allows the account root to administer it; you must explicitly add principals that may encrypt and decrypt. Separate principals for admin and use — admins set policy, never decrypt; apps decrypt, never administer.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "KeyAdmins — set policy, never use",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::123456789012:role/security-admin" },
"Action": ["kms:Describe*", "kms:List*", "kms:Get*", "kms:PutKeyPolicy",
"kms:UpdateAlias", "kms:EnableKeyRotation", "kms:ScheduleKeyDeletion"],
"Resource": "*"
},
{
"Sid": "KeyUsers — encrypt/decrypt within the account, only via VPC endpoint, only with TLS",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::123456789012:role/payments-api" },
"Action": ["kms:Encrypt", "kms:Decrypt", "kms:GenerateDataKey*", "kms:DescribeKey"],
"Resource": "*",
"Condition": {
"StringEquals": { "aws:SourceVpce": "vpce-0abcd…" },
"Bool": { "aws:SecureTransport": "true" },
"StringEqualsIfExists": { "kms:EncryptionContext:tenant": "\${aws:PrincipalTag/tenant}" }
}
}
]
}Three serious controls in one policy: source VPC endpoint pinning (the call must originate inside our private network), TLS-only, and an encryption context binding so a principal tagged for tenant A cannot decrypt material encrypted with a tenant-B context. Encryption contexts are the most underused KMS feature; they turn key access into a tenant-scoped operation almost for free.
Grants — short-lived delegation
A grant is an alternative to inline policy that delegates a narrow set of operations on a key to a specific principal for a bounded time. Useful when a service needs to mint credentials for a worker without giving it long-term key permissions. CloudTrail logs every grant and every use.
Dual-control / quorum operations
For the most sensitive keys (master signing keys, code-signing roots), require multiple humans to approve any administrative change. AWS supports this through multi-party approval in CloudHSM and IAM Access Analyzer policy validation; Azure Key Vault Managed HSM offers activation quorum via M-of-N security officer cards; many on-prem PKI deployments use Shamir's Secret Sharing for the offline root.
kms:ScheduleKeyDeletion and an SCP that requires a quorum approval to actually delete a key marked protected: true. Mistaken deletion is one of the few cloud actions that can leak silent unrecoverable losses.BYOK, HYOK, EKM — Customer-Controlled Keys
Three names for a similar idea, with different operational implications:
- BYOK (Bring Your Own Key). You generate key material on your HSM and import it into the cloud KMS. Decryption still happens in the cloud's HSM; you control the source. Used for compliance regimes that require key generation outside the cloud.
- HYOK (Hold Your Own Key). The key never enters the cloud. Cloud services call your on-prem HSM via an external interface. Operationally heavy and limits service support — only some services accept HYOK.
- EKM (External Key Manager). The provider's pattern (AWS XKS / GCP EKM) — your HSM is queried for every key operation. The cloud cannot decrypt without your live participation; revoking access is real and instant.
Honest assessment: BYOK is mostly a compliance check-box. HYOK and EKM provide real residual control — at the cost of latency, availability dependence on your own HSM, and reduced service support. Choose based on the threat model your auditor is actually scared of.
Encryption Patterns Where They Live
At rest
Provider services (S3 SSE-KMS, RDS, EBS, GCS CSEK, Azure Blob CMK) handle disk-level encryption transparently. Always use customer-managed keys instead of provider-managed defaults — the access control and audit upgrade is meaningful even when the cipher is identical.
In transit
TLS everywhere — public, internal, even "trusted" networks. Expect TLS 1.2 with strong suites or TLS 1.3 only. Service meshes (Day 2 PM) enforce mTLS uniformly. For database connections, prefer PrivateLink + the database's native TLS over IPsec or VPN tunnels — fewer moving parts.
In use — confidential computing
Memory-resident plaintext is the new at-rest. Confidential VMs (AMD SEV-SNP, Intel TDX, AWS Nitro Enclaves, GCP Confidential VMs, Azure Confidential Computing) encrypt VM memory and provide an attestation channel — the workload can prove what code is running before it accepts decryption keys. Real use cases today: multi-party data analytics where the operator must not see plaintext, and key-management services that run inside enclaves so even the cloud operator cannot read the key.
Application-layer encryption
For especially sensitive fields — payment instruments, health identifiers — encrypt at the application before the data hits the database. AWS Encryption SDK, Google Tink, libsodium, age. The DB sees ciphertext only; admins, DBAs, and BI tools cannot read the field without the app's role.
Secrets Management
A secret is anything an attacker would prize: API tokens, database passwords, OAuth refresh tokens, signing keys, webhook secrets. Secrets in Git repositories or environment files are the single most common cause of leaked-credential incidents. The fix has three parts.
1. A central store
- HashiCorp Vault — best feature set, runs everywhere, supports dynamic secrets (mints DB credentials per session) and PKI. Operationally heavy.
- AWS Secrets Manager / SSM Parameter Store — simpler, KMS-backed, native rotation Lambdas. Most AWS estates use this.
- GCP Secret Manager / Azure Key Vault — equivalents.
- External Secrets Operator in Kubernetes — bridges any of the above into K8s as native Secrets, kept in sync.
2. Never let a static secret reach the workload
For database passwords, prefer cloud IAM authentication (RDS IAM auth, Cloud SQL IAM auth, Postgres-OIDC) — short-lived tokens minted from the workload identity. For external APIs, mount via the Secrets Store CSI driver so the secret is read into a tmpfs at start and never persisted on disk.
3. Rotate, automatically and visibly
The right rotation cadence depends on the secret. Database passwords: 30-90 days, automated via a Lambda or Vault rotation backend. API tokens to external SaaS: as often as the SaaS allows; many don't allow zero-downtime rotation, so design dual-key acceptance windows. Signing keys: every 12 months with overlap. OIDC client secrets: annually, validated by certificate transparency-style monitoring.
Tokenization vs Encryption
For payment cards, government IDs, and medical record numbers, the typical right answer is not encryption but tokenization: the value is replaced with an opaque, format-preserving token; the original lives only in a hardened vault accessed by a tiny scope of code (the payment service, the identity service). Most application code only ever sees the token, so the data plane simply does not contain regulated data.
Tokenization shrinks PCI/HIPAA scope dramatically — auditors care about systems that handle the actual data, not those that handle tokens. Stripe, Adyen, and most cloud-payment providers tokenize at the edge. For internal cases, VGS, Skyflow, Privacera, and home-rolled vaults are common.
kms:Decrypt. Six months later a credential leak gives the attacker the DBA role's keys. They can SELECT * from the database, see plaintext, and exfiltrate. What was the encryption actually protecting against, and what control was missing?Show answer
Cryptographic Agility & Post-Quantum
NIST has now standardized post-quantum algorithms — ML-KEM (FIPS 203, formerly Kyber) for key establishment and ML-DSA (FIPS 204, formerly Dilithium) for signatures. The threat model is "harvest now, decrypt later": attackers store today's TLS captures and decrypt them when CRQCs (cryptographically relevant quantum computers) arrive. For long-lived data, this matters today. Hybrid TLS 1.3 (X25519+ML-KEM) is rolling out in browsers and major CDN/ALB stacks; AWS KMS, GCP and Azure are publishing PQ migration paths. Crypto agility — the ability to swap algorithms without rewriting applications — is suddenly a real engineering requirement. SDKs like AWS Encryption SDK and Tink wrap algorithm choice in algorithm suites with versioned identifiers; aim to use these abstractions rather than hard-coding cipher names.
- Envelope-encrypt, never bulk-encrypt with the master.
- Hierarchy: HSM root → CMK → KEK → DEK.
- Context: bind every encrypt to tenant / purpose.
- Audit: every Decrypt logged with caller.
- Rotate: keys, secrets, and dual-control approvers.
kms:Decrypt on a CMK and tries to decrypt an arbitrary ciphertext. The decryption fails. The CMK supports the algorithm; the policy allows the action. What is the most likely cause?{tenant: "acme"}) and the policy or the SDK requires the same context on decrypt. The attacker is using a different (or empty) context, so KMS rejects the operation as not matching the AAD. This is one of the few "low-effort, high-impact" controls in KMS — and a reason to always use encryption contexts in application code, not just rely on key policy.- AWS KMS — Concepts & envelope encryptionaws.amazon.com
- AWS KMS — Encryption contextaws.amazon.com
- NIST SP 800-57 — Key Management Recommendationsnist.gov
- FIPS 203 — ML-KEMnist.gov
- FIPS 204 — ML-DSAnist.gov
- HashiCorp Vault — Conceptshashicorp.com
- SOPS — Mozillagithub.com
- PCI SSC — Tokenization Product Security Guidelinespcisecuritystandards.org
Finished reading?