
Governance, Compliance & Architecture Review
Tie it together. Map the controls from days 1-6 onto the compliance regimes auditors actually run (SOC 2, ISO 27001, PCI, HIPAA, FedRAMP), learn how to threat-model cloud-native systems, and run the kind of architecture review that finds real issues without becoming theatre.
What you will learn
Six days in, you have the technical building blocks. Day 7 is the system that operates them — how compliance, threat modeling and review actually fit together in a real engineering organization. Mostly, they are simpler than the consultant decks suggest, and harder than the engineering teams imagine. The trick is that none of them is a one-shot deliverable; all are continuous practices.
The Compliance Map
Six frameworks cover ninety-something percent of what cloud teams encounter. They overlap heavily; one well-run control program satisfies most.
| Regime | Who needs it | What it really tests | Cloud delta |
|---|---|---|---|
| SOC 2 | Most US B2B SaaS | Trust Services Criteria: Security (mandatory), Availability, Confidentiality, Processing Integrity, Privacy. Type II = operating effectiveness over 6+ months | Auditors expect cloud-native evidence — IAM exports, CloudTrail samples, alarm history |
| ISO 27001 | Global B2B / EU customers | An ISMS — formally documented, risk-driven, reviewed. Annex A 2022 has 93 controls | Pairs with ISO 27017 (cloud) and ISO 27018 (PII processors) |
| PCI DSS 4.0 | Anyone touching card data | 12 requirements, hard prescriptive controls (segmentation, encryption, MFA, scanning) | 4.0 added customized approach for cloud equivalents; explicit mention of containers and serverless |
| HIPAA | US healthcare data | Administrative, physical, technical safeguards. Contractual via BAAs | Cloud BAAs from AWS/GCP/Azure cover infra; your use is still your responsibility |
| FedRAMP | US gov SaaS | NIST SP 800-53 Rev 5 catalog, risk-categorized (Low/Moderate/High). Heavily prescriptive | StateRAMP, IL-2/4/5/6 (DoD) tiers; PMO audits annually |
| GDPR / DPDP / CCPA | Personal data of EU/India/CA residents | Lawful basis, data minimization, subject rights, 72-hr breach notification, DPIAs | Cloud transfers to non-adequate countries need SCCs / TIA + technical controls |
The 80/20 control bundle
If you want to be in good shape for any of the above, six control families cover the bulk of what every regime cares about:
- Identity & access — SSO + MFA for humans, workload identity for services, periodic access reviews, least privilege documented.
- Encryption — at rest with customer-managed keys, in transit with TLS 1.2+, key rotation policy.
- Logging & monitoring — centralized logs, SIEM with detections, retention meeting the longest applicable regime.
- Change management — code review, IaC, approved deploy paths, audit trail.
- Vulnerability management — scanning at build, patching SLAs, exception process.
- Incident response — runbooks, on-call, post-incident reviews, breach notification process.
Document each as a one-page policy, automate as much as possible (the procedure), and have evidence ready (the artifact: CloudTrail samples, IAM access advisor, scan results, review tickets, post-mortems). That triple — policy, procedure, artifact — is what an auditor wants to see for any control.
Compliance as Code
Manual evidence collection is expensive and error-prone. Modern teams treat compliance evidence as a build target, not a quarterly scramble.
- Continuous control monitoring — Vanta, Drata, Secureframe, Tugboat, Wiz, Prowler, OpenSCAP. They subscribe to your cloud APIs and assert controls ("all S3 buckets are private," "MFA on all human users") continuously.
- Policy as code — your SCPs, OPA policies, Kyverno rules, Cloud Custodian policies are the controls. Drift away from them is the audit finding; pre-merge tests prevent it.
- OSCAL — NIST's structured language for control catalogs and assessments. FedRAMP packages are now distributed in OSCAL; expect this to spread.
Threat Modeling Cloud-Native Systems
STRIDE (Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege) was written for desktop apps; the categories still apply, but cloud expands the surface.
The right level of abstraction
A useful threat model is system-level, not feature-level. Draw the data flow: components, trust boundaries, data classifications. The trust boundaries are where threats live — every flow that crosses one is a risk to evaluate. In cloud, common boundaries:
- Internet → public ALB / API Gateway.
- Public subnet → private subnet.
- Account A → Account B (assume-role, EventBridge, S3 cross-account).
- Region A → Region B.
- Provider A → Provider B (multi-cloud).
- Production → analytics / data warehouse.
- Synchronous service ↔ third-party SaaS.
STRIDE-Cloud question set
For each flow crossing a boundary, ask:
- Spoofing — who is the caller and how is their identity proven? (mTLS? OIDC? signed request?)
- Tampering — is the request integrity-protected? (Replay possible? AAD on encrypted DEKs?)
- Repudiation — is this call logged with the actual principal? (CloudTrail? K8s audit? IdP?)
- Information disclosure — what data crosses, encrypted with what key, accessible to whom?
- Denial of service — what is the request quota, the cost ceiling, the autoscale limit?
- Elevation of privilege — does completing this flow give the caller more than they should keep?
The cloud-flavoured additions
- Configuration drift — what could turn this benign flow into a public exposure with one bad checkbox?
- Tag integrity — if access depends on tags (ABAC), what stops the wrong tag value?
- Cross-account confused-deputy — does any service we trust act on requests from less-trusted callers? (S3 bucket policy with
aws:SourceAccount, EventBridge cross-account, KMS grant abuse.) - Build & deploy path — could an attacker shift the binary that ends up in this component? (Day 5.)
The Architecture Review
The cheapest place to find a security issue is the design phase. The most expensive place is post-incident. Architecture reviews bridge them.
What good looks like
- Triggered by code, not by calendar. A new service, new data class, new external dependency, new region — these trigger a review. Quarterly all-hands reviews catch nothing.
- Lightweight when possible. A 1-page RFC + 30-min sync covers most. Reserve full threat-model exercises for genuinely new threat surfaces (new auth method, new untrusted input class, new data-out flow).
- Specific, written follow-ups. "Add an SCP for region pinning" beats "consider further hardening." Tracked in the same issue tracker as feature work, with owners and dates.
- Reviewers who own outcomes. A review where security raises concerns and engineering politely accepts and ignores them is theatre. Either security has a concrete must-fix list or the concerns convert to documented accepted risks signed by a named risk owner.
The architecture review checklist
One page; the same questions for every review. Most fail in predictable ways.
- Trust boundaries. Drawn? Each crossing authenticated, authorized, logged?
- Identity model. Every principal short-lived? Workload identity for services? OIDC for CI? Long-lived access keys called out as exceptions?
- Data classification. What is in motion and at rest? Most-sensitive class drives controls.
- Encryption. Customer-managed keys with rotation? Encryption context bound to tenant/purpose?
- Network. Default-deny? Egress filtered or proxied? Endpoint policies pin to org?
- Admission & supply chain. Image signing required? Provenance verified? Dependency hygiene?
- Detection. Three things that, if observed, would tell us this system is being abused. Where they land. Who is paged.
- Blast radius. If this component is fully compromised, what can the attacker reach? Document it.
- Disaster recovery. RTO / RPO. Backups verified by test restore. Cross-region copies for highest-class data.
- Compliance scope. Which regimes apply? Which controls does this design implicate?
The Risk Register
Not every issue is fixable today; some are accepted with mitigations. The risk register is the document of record. Each entry has: description, blast radius, likelihood, current mitigations, the remediation plan and date, and the named risk owner. Reviewed monthly; expired items reopened or formally re-accepted.
Why this matters operationally: when an incident happens, the first question regulators and customers ask is "did you know about this risk?" If yes and it was on the register with a plan, that is hard work being recognized. If yes and it was buried in a Slack thread, that is negligence. If no — well, you find out together.
The Engineering–Security Operating Model
Three structures have emerged for how engineering and security work together. None is universally right; all beat "throw it over the wall."
- Embedded security engineers — one security eng per platform area, in the team's standups. Highest empathy; doesn't scale past tens of teams.
- Security champions — engineers in product teams trained as the first line; central security supports. Scales further; relies on champions actually having time.
- Paved roads + shift-left — central security ships golden modules, scaffolders, libraries. Product teams build on top. Highest leverage; works only when the paved road actually solves the team's problem.
The honest truth is that the highest-functioning organisations do all three: paved roads as default, champions in teams, and embedded engineers for the highest-risk components.
Putting It All Together — The Course Loop
A final lap through the course as a single coherent practice:
- Architect — model the system, identify trust boundaries, run STRIDE-Cloud, choose controls. (Days 1, 2, 7.)
- Build — paved roads with IAM modules, signed images, scoped credentials, IaC with policy-as-code. (Days 1, 5.)
- Run — deploy with admission policy, mesh-enforced mTLS, default-deny networks. (Days 2, 3, 4.)
- Detect — telemetry + Sigma rules + GuardDuty + on-call. (Day 6.)
- Respond — runbooks, snapshots, communications, post-mortems. (Day 6.)
- Loop — incident retro becomes threat-model input becomes paved-road improvement.
Show answer
- Trust — boundaries explicit?
- Identity — short-lived, scoped?
- Data — classified, encrypted with managed keys?
- Network — default-deny + filtered egress?
- Supply — signed + attested + admission-verified?
- Detect — three signals + paged owner?
- Blast — radius mapped, accepted or shrunk?
s3:*; (c) IAM Access Analyzer monitors for policy drift; (d) CloudTrail GuardDuty rule fires on cross-account assume-role. With those four, the risk is genuinely mitigated and provable. "Least-privilege IAM" alone is the kind of statement that closes a ticket and reopens an incident two years later.- AICPA — SOC 2 Trust Services Criteriaaicpa-cima.com
- ISO/IEC 27001:2022iso.org
- PCI Security Standards — DSS 4.0pcisecuritystandards.org
- FedRAMP — Documents & templatesfedramp.gov
- NIST SP 800-53 Rev 5 — Security & Privacy Controlsnist.gov
- OSCAL — Open Security Controls Assessment Languagenist.gov
- OWASP — Threat Modelingowasp.org
- Adam Shostack — Threat Modeling resourcesshostack.org
Finished reading?