
IAM Deep Dive — Principals, Policies & STS
If misconfiguration dominates cloud breaches, IAM is the field where it lives. Master principals, policy evaluation, AssumeRole and STS, conditions and ABAC, federation, and the patterns that make least privilege survivable in production.
What you will learn
Identity and Access Management is where most cloud security wins — and most cloud breaches — happen. The provider gives you a powerful, declarative engine for saying who can do what to which resources, under what conditions. Most teams use perhaps five percent of it and pay for the other ninety-five percent in incident reports. This chapter is a deep dive into how the engine actually evaluates a request, the credential lifecycle behind every API call, and the patterns that hold up at scale.
Principals, Resources, Actions, Conditions
Strip the JSON away and every cloud's IAM language is the same triple-plus-context: a principal (the caller — user, role, workload identity) requests an action (s3:GetObject, kms:Decrypt) on a resource (the ARN of the bucket key, the URI of the storage object) under some conditions (source IP, tag match, TLS version, MFA presence). Policies are documents that match against this tuple and produce an Allow, a Deny, or no opinion.
The provider noun map
| Concept | AWS | GCP | Azure |
|---|---|---|---|
| Principal type | User, Role, Service-linked role | User, Service Account, Group | User, Service Principal, Managed Identity |
| Workload identity | EC2 instance role / IRSA / Pod identity | Workload Identity Federation | Managed Identity |
| Permission unit | Action (e.g. s3:GetObject) | Permission (storage.objects.get) | Action (Microsoft.Storage/.../read) |
| Bundle | Policy | Role (predefined / custom) | Role definition |
| Attachment | Identity / resource policy | Role binding (on resource hierarchy) | Role assignment (on scope) |
| Org-wide guardrail | SCPs (Organizations) | Org Policies / VPC-SC | Azure Policy / Management Groups |
| Per-identity ceiling | Permission boundary | (via deny policies) | (via deny assignments) |
The vocabulary differs but the algebra is the same. We will use AWS in the examples because its policy language is the most explicit; the patterns translate directly.
Policy Evaluation — Deny Wins
Two principles cover ninety percent of "why is this not working" debugging. An explicit Deny anywhere wins. The default is implicit deny. Everything else is sequencing: AWS evaluates organizational SCPs first (do the org guardrails even allow this action?), then identity policies and resource policies, then any permission boundary, then session policies on STS-assumed roles. If at any point a Deny matches, you stop; otherwise you need an explicit Allow somewhere to succeed.
The intersection rule for boundaries
A permission boundary is the maximum a principal can ever be granted, regardless of its identity policy. Boundaries are commonly used in delegated account models: platform engineers want app teams to create their own roles, but only within a ceiling. The effective permissions are the intersection of identity policy and boundary. SCPs work the same way at the org level — they cap an account's possible API surface.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyOutsideHomeRegion",
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": { "aws:RequestedRegion": ["us-east-1", "us-east-2"] },
"BoolIfExists": { "aws:ViaAWSService": "false" }
}
},
{
"Sid": "DenyRootUser",
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": { "StringLike": { "aws:PrincipalArn": "arn:aws:iam::*:root" } }
}
]
}This is a real-world starter SCP: pin your accounts to allowed regions (huge blast-radius win) and forbid the root user from doing anything. It runs above identity policies, so even an unbounded admin role cannot create a resource in eu-central-1.
FullAWSAccess in the SCP unblocks it; you must still attach an identity policy that grants s3:GetObject for the principal to actually use S3.STS, AssumeRole and the Short-Lived-Credential Loop
Long-lived AKIA… access keys are the single biggest source of leaked-credential incidents in cloud breaches. Public repos, CI logs, browser dev-tools, screen shares — they leak. The cure is to treat IAM users as starting points and roles as the actual operating identities. Every workload, CI job, federated user, and most humans should operate from short-lived STS credentials.
The two policies on every role
A role has two policy documents, and they are easy to confuse. The trust policy defines who can assume the role — its Principal can be a service (ec2.amazonaws.com), another account, an OIDC issuer, or an identity. The permission policies attached to the role define what the resulting session can do once assumed. The trust policy is the door; the permission policy is the floor of the room.
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
},
"StringLike": {
"token.actions.githubusercontent.com:sub":
"repo:my-org/payments-service:ref:refs/heads/main"
}
}
}]
}Three things are doing the work in that StringLike. The repo binds to one repository; an attacker with creds in another repo cannot use this role. The ref binds to main; pull-request workflows from forks cannot assume it. And the audience claim binds to sts.amazonaws.com; tokens minted for any other audience (e.g. accidentally configured to GitHub itself) cannot be replayed. Always include all three. The Microsoft / Sysdig research on misconfigured GitHub-OIDC trust policies (2023) found tens of thousands of public examples leaving sub as *.
Session policies — narrowing on assume
When you call AssumeRole you can pass an optional session policy. The resulting credentials' permissions are the intersection of the role's permission policy and your session policy. This is the standard way to mint a credential that is narrower than the role itself — useful for handing tenant-scoped tokens to per-customer worker invocations without creating one role per customer.
RoleA, which has s3:* on bucket X. The pod calls AssumeRole with a session policy of s3:GetObject on X/reports/*. Can the resulting credentials write to X/reports/audit.csv?Show answer
s3:*) and the session policy (s3:GetObject) — that intersection is just s3:GetObject. PutObject is missing, so the write fails. This is the canonical pattern for safely handing scoped credentials to less-trusted code.Conditions and ABAC — Tags as Policy
Pure RBAC creates one role per resource scope and explodes by Cartesian product. The dominant scaling pattern is ABAC — Attribute-Based Access Control — where the same role's policy uses tags or attributes to constrain which resources it touches.
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/Env": "${aws:PrincipalTag/Env}",
"aws:ResourceTag/Team": "${aws:PrincipalTag/Team}"
}
}
}]
}One policy. Every principal tagged Team=payments,Env=prod can read every bucket tagged the same way; nothing else. Add a new bucket and tag it correctly and access works without a policy change. The trade-off: tags become the security boundary, so a stale or forgotten tag is now a vulnerability. Pair ABAC with mandatory-tag SCPs and a periodic tag-drift audit.
The condition keys you will use most
aws:SourceIp— restrict to office / VPN / CI runner IPs (avoid as your only control; IPs are spoofable inside the account).aws:SourceVpce/aws:SourceVpc— only via specific VPC endpoints. Crucial for restricting S3 to private connectivity.aws:MultiFactorAuthPresent/aws:MultiFactorAuthAge— gate destructive actions on fresh MFA.aws:SecureTransport— only over TLS. Pair with a bucket policy that denies otherwise.aws:CalledVia— distinguish direct calls from service-to-service ones.aws:ResourceTag/*,aws:RequestTag/*,aws:PrincipalTag/*— the ABAC backbone.
aws:PrincipalOrgIDaws:PrincipalOrgID to your AWS Organizations org ID, not specific account IDs. New accounts join the org without a policy update; rogue identities outside the org are blocked even if they have the role ARN. This is a quiet superpower for SaaS-style internal platforms.Federation — Where Identity Comes From
Three federation flavours cover almost every real principal. Get them right and you may never create another long-lived IAM user.
Humans — SAML or OIDC SSO
Humans authenticate to your IdP (Okta, Entra, Google Workspace, Auth0) and the IdP brokers a SAML or OIDC assertion to AWS IAM Identity Center / GCP Cloud Identity / Azure AD. The principal in CloudTrail is the federated user, with attributes (groups, department) that drive permission set selection. No human should have a long-lived IAM user.
CI — OIDC, never static keys
GitHub Actions, GitLab CI, CircleCI, Buildkite all expose OIDC tokens. Configure the cloud provider's OIDC trust to those issuers and use the sub / workflow / environment claims as the trust gate. The result: zero deploy-time secrets, full audit trail of which workflow assumed which role.
Workloads — IRSA, Workload Identity, Managed Identity
Pods on EKS use IAM Roles for Service Accounts (IRSA): each Kubernetes ServiceAccount maps to an IAM role through an OIDC trust to the cluster's issuer. GKE has Workload Identity Federation; AKS has Pod-managed Identities / Workload Identity. The pattern is the same in every direction: the workload presents a signed identity token (often a Kubernetes-issued JWT) and the cloud's STS exchanges it for short-lived API credentials. We will see this end-to-end on Day 3.
- Org guardrails before identity grants.
- Boundaries cap; Identity grants; Session narrows.
- Conditions are the real least-privilege lever.
Privilege Patterns That Hold Up at Scale
Permission boundaries for delegated admin
Platform engineers pre-create a permission boundary that defines what any team-created role can ever do ("no IAM mutations on the org's protected paths, no iam:* wildcards"). Developers self-serve roles within this ceiling. Result: app teams move fast without waiting on platform; platform retains an enforceable cap.
Break-glass roles
One or two named break-glass roles per account, with full admin and trust restricted to a small, audited set of humans, MFA-required, and assumption alerted to PagerDuty. Day-to-day work uses scoped roles. Break-glass is for true emergencies, and every assumption triggers a review next business day.
Just-in-time elevation
Tools like AWS IAM Identity Center session policies, GCP Privileged Access Manager, ConductorOne, Sym, or in-house tooling let an engineer request elevated access for a specific time-bounded reason. The role exists; the human cannot use it without a request and approval. Removes the entire "I'm an admin because I might need to be" class of standing risk.
Tag-on-create and resource control policies
Use aws:RequestTag and aws:TagKeys to require that a principal tags the resource appropriately at creation time, then key further policies off those tags. Untaggable creates fail. AWS's newer Resource Control Policies (RCPs) give you SCP-style guardrails on resource policies — handy when you need to block public S3 buckets organisation-wide.
iam:CreateAccessKey on another user; iam:PassRole + lambda:CreateFunction (run code as the passed role); iam:UpdateAssumeRolePolicy (rewrite a role's trust policy); iam:AttachUserPolicy on yourself; kms:Decrypt on a key holding other principals' material. Tools like Cloudsplaining, PMapper, and IAM Vulnerable map these. Permission boundaries should explicitly deny these on developer roles.iam:PassRole on a powerful CI role and codebuild:*. They claim they cannot escalate because the developer role itself only has s3:Get*. Are they right?iam:PassRole permission), then have the build script call any API the CI role can. Their effective permissions are now the union of their own role and any role they can pass to a service that runs code. iam:PassRole is a transitive grant — always scope it to specific role ARNs, never *.Auditing IAM in Practice
Two reports run weekly are worth their weight in incidents avoided. Unused access: AWS IAM Access Analyzer (last-accessed) flags principals with permissions they never use; trim or revoke. Public & cross-account exposure: Access Analyzer (external access) reports every resource policy reachable from outside the account or org — your starting list of "why" questions. GCP's Recommender and Azure's Defender for Cloud have equivalents.
For static review, Cloudsplaining (Salesforce) and Parliament (Duo) lint policy documents for over-privilege; PMapper renders the actual privilege graph including transitive paths. Run them in CI on policy changes — the pattern of "the policy looks scoped but transitively grants admin" is too easy to miss in review.
iam:PassRole, iam:UpdateAssumeRolePolicy, and kms:Decrypt — they are escalation primitives whether you intend them to be or not.- AWS IAM — Policy evaluation logicaws.amazon.com
- GitHub — OIDC for cloud deploymentsgithub.com
- Cloudsplaining — IAM policy lintergithub.com
- PMapper — Principal mapping & privilege graphgithub.com
- AWS — Resource Control Policies (RCPs)aws.amazon.com
- NIST SP 800-162 — Attribute Based Access Controlnist.gov
- Bishop Fox — IAM Vulnerable (escalation lab)github.com
Finished reading?