It is 3:14 AM and an alert fires from our SIEM. A developer at a fintech client just exported a customer master key reference from a CI pipeline log. The key itself never leaves the HSM, but the ARN, the alias, and the IAM role attached to it are now sitting in a build artifact that anyone with read access to the bucket can pull. By 3:42 AM we have the role revoked, the key disabled, and a fresh CMK provisioned. That incident is why I treat encryption key management across AWS, Azure, and GCP as a detection problem first and a cryptography problem second. If you cannot see who touched a key, the math does not save you.
This article walks through how AWS KMS, Azure Key Vault, and Google Cloud KMS actually behave under operational pressure, where Terraform earns its keep, and where each provider quietly diverges from the others in ways that will bite you during an audit.
The Four Operations That Define Key Management
Every conversation about KMS eventually collapses into four verbs: create, rotate, use, and delete. Vendors dress these up differently, but the lifecycle is identical. Creation generates the cryptographic material. Rotation replaces it on a schedule so that a single compromise does not own your entire historical dataset. Usage is the access control plane — which principal can call Encrypt or Decrypt against which key. Deletion is the part most teams get wrong, because soft-delete windows differ by provider and a hasty terraform destroy can put you in a seven-to-thirty day recovery scramble.
The MITRE ATT&CK technique I map this to is T1552.001 (Credentials in Files) and T1078.004 (Cloud Accounts). Most key compromise events I have triaged in the last three years started not with a broken cipher but with a leaked IAM credential that had kms:Decrypt attached to a wildcard resource. Your detection engineering effort belongs there.
AWS KMS: The Default That Most Teams Underuse
AWS KMS is the workhorse. It integrates with S3, EBS, RDS, Secrets Manager, and roughly every other service in the catalog. Customer Master Keys live in the FIPS 140-2 validated HSM backend, and AWS handles the operational burden. Automatic key rotation, when enabled, generates new cryptographic material yearly while preserving the same key ID — old ciphertext stays decryptable, new ciphertext uses the fresh material.
Here is the Terraform pattern we ship to almost every AWS client engagement:
resource "aws_kms_key" "secrets" {
description = "CMK for application secrets"
deletion_window_in_days = 30
enable_key_rotation = true
policy = data.aws_iam_policy_document.kms_policy.json
}
resource "aws_secretsmanager_secret" "db_password" {
name = "prod/db/password"
kms_key_id = aws_kms_key.secrets.arn
recovery_window_in_days = 7
}
Two settings matter more than the rest. The 30-day deletion window is intentional — the AWS default of 7 is too aggressive for production. And enable_key_rotation = true costs you nothing and silently improves your posture. If you are not setting it, you are leaving free security on the table.
The Detection Layer Most People Skip
CloudTrail logs every KMS API call. The query that has caught the most real incidents for our SOC is this Athena pattern, which I keep tuned to drop noise from legitimate service principals:
SELECT eventTime, userIdentity.arn, eventName, requestParameters
FROM cloudtrail_logs
WHERE eventSource = 'kms.amazonaws.com'
AND eventName IN ('Decrypt','GenerateDataKey','ScheduleKeyDeletion')
AND userIdentity.type != 'AWSService'
AND eventTime > current_timestamp - interval '1' hour;
That single query, wired into a scheduled detection, flagged the fintech incident I opened with. NIST CSF calls this Detect.AE-3 — event correlation. In practice it is the difference between catching key abuse in minutes versus discovering it during a quarterly audit.
Azure Key Vault: Powerful, but the Tier Choice Matters
Azure Key Vault is the closest analogue to AWS KMS, but the architecture diverges in ways that affect cost and compliance. Standard tier uses software-protected keys. Premium tier backs keys with HSMs validated to FIPS 140-2 Level 3. If your client is in regulated financial services or healthcare, Premium is not optional — pick it from day one because migrating later is painful.
The Terraform looks similar but the provider abstractions are different:
resource "azurerm_key_vault" "prod" {
name = "kv-prod-eastus-01"
location = "East US"
resource_group_name = azurerm_resource_group.sec.name
sku_name = "premium"
tenant_id = data.azurerm_client_config.current.tenant_id
purge_protection_enabled = true
soft_delete_retention_days = 90
}
resource "azurerm_key_vault_key" "app" {
name = "app-encryption-key"
key_vault_id = azurerm_key_vault.prod.id
key_type = "RSA-HSM"
key_size = 4096
key_opts = ["decrypt","encrypt","sign","verify","wrapKey","unwrapKey"]
rotation_policy {
automatic { time_before_expiry = "P30D" }
expire_after = "P90D"
notify_before_expiry = "P29D"
}
}
Two settings deserve attention. purge_protection_enabled = true is irreversible once set, and that is the point — it prevents an attacker or a panicked admin from purging the vault during the soft-delete window. The rotation policy block is relatively new and underused; teams I audit are still rotating Azure keys manually because they never updated their modules.
The Caveat That Catches Multi-Cloud Teams
Here is the limitation that surprises every architect I work with: Azure Key Vault does not support native cross-region key replication the way AWS KMS multi-region keys do. You can replicate the vault contents via backup-and-restore workflows, but the keys themselves are region-bound. If your disaster recovery design assumes symmetric KMS behavior across the three clouds, fix that assumption now. We learned this on a logistics client engagement when we standardized their backup strategy and discovered the DR runbook had a one-line assumption that did not hold. The fix involved a paired-vault pattern with explicit key versioning — workable, but not free.
Google Cloud KMS: The Cleanest API, the Strictest Hierarchy
GCP KMS organizes everything into key rings, which scope to a location, and crypto keys, which live inside the ring. The hierarchy is rigid and that is a feature. You cannot accidentally cross regions the way you can with a misconfigured AWS multi-region key.
resource "google_kms_key_ring" "prod" {
name = "prod-keyring"
location = "us-central1"
}
resource "google_kms_crypto_key" "app" {
name = "app-data-key"
key_ring = google_kms_key_ring.prod.id
purpose = "ENCRYPT_DECRYPT"
rotation_period = "7776000s"
version_template {
algorithm = "GOOGLE_SYMMETRIC_ENCRYPTION"
protection_level = "HSM"
}
}
The 7776000 second rotation period is 90 days. That is the value I default to for production workloads. GCP gives you per-key audit logging through Cloud Audit Logs automatically — no separate trail to enable. The integration with Workload Identity Federation also makes it the cleanest of the three for keyless service-to-service auth, which reduces your overall attack surface.
The Multi-Cloud Reality
I will be direct about an opinion most cloud architects dislike hearing: a single unified KMS strategy across AWS, Azure, and GCP is a fantasy unless you accept significant operational compromise. The three APIs, the three IAM models, and the three audit log schemas do not converge. Terraform helps you provision consistently, but it does not erase the underlying differences.
What actually works on engagements where we support all three clouds is a provider-native KMS in each environment, with a thin abstraction layer in your application code and a unified detection layer in your SIEM. We pipe CloudTrail, Azure Activity Logs, and GCP Audit Logs into the same Sentinel or Splunk workspace and write detection rules per provider. The mean time to detect a key abuse event in that setup drops to under fifteen minutes. Gartner has been pushing this pattern as cloud-native posture management, which I think is correct.
The Terraform Module Pattern We Ship
For clients who run workloads across two or three clouds, we ship a wrapper module that exposes a single interface — kms_key(name, rotation_days, hsm) — and resolves it to the provider-specific resource. The module enforces our baseline: rotation enabled, 30-day soft delete, HSM protection for production, and tagging that aligns with our SOC’s correlation queries. That last piece matters because consistent tagging is what lets one detection rule cover all three clouds. If you are running production workloads that need encrypted backups and consistent recovery posture, our team can help with both the bare-metal recovery design and the underlying VPS server hardening — reach out at https://clients.sse.to/contact.php if you want a working session.
Related Operational Reading
If you are integrating KMS calls into automation, the patterns in our guide to PowerShell REST API calls with Invoke-RestMethod apply directly to calling provider KMS APIs from runbooks. For teams thinking about regional design, the S3 bucket region selection strategy piece covers the same locality reasoning that drives KMS region choice. And if your environment includes hybrid PowerShell automation, our breakdown of PowerShell remoting auth covers the credential boundary that often touches KMS-stored secrets.
The Playbook Takeaway
Treat your KMS configuration as detection infrastructure, not just cryptographic infrastructure. Enable rotation everywhere it is free. Pick Premium or HSM protection tiers from day one in regulated environments. Set deletion windows long enough to survive a bad Friday. And wire every KMS API call into your SIEM with detection rules that distinguish service principals from human identities. The keys themselves are easy. The operational discipline around them is what separates a clean audit from a breach notification.


