FinTech IaC Is a Different Animal
If you've done IaC at a SaaS company, you might think you know the drill. Write some Terraform, set up a CI pipeline, call it a day. I thought the same thing before I joined my first payments company. Within a week, I learned that FinTech infrastructure lives under constraints that change everything about how you write, review, and deploy code.
The difference boils down to three things: compliance mandates like PCI DSS that dictate how infrastructure must be configured, audit trail requirements that mean every change needs to be traceable back to a ticket and an approver, and change management processes that can turn a five-minute deploy into a two-day review cycle if you're not careful.
In a typical startup, you might terraform apply from your laptop. In a PCI-scoped environment, that's a finding. Every infrastructure change needs to flow through an auditable pipeline with separation of duties — the person who writes the code can't be the person who approves the apply. This isn't optional; it's what your QSA will ask about during your next assessment.
Terraform vs Pulumi vs CloudFormation
I've used all three in production payment systems. Each has real trade-offs that matter more in FinTech than in general-purpose infrastructure work. Here's how they stack up:
| Criteria | Terraform | Pulumi | CloudFormation |
|---|---|---|---|
| Language | HCL (domain-specific) | Python, TypeScript, Go, etc. | JSON / YAML |
| Multi-Cloud | Strong (provider ecosystem) | Strong (native providers) | AWS only |
| State Management | Remote backends (S3, TFC) | Managed service or self-hosted | Managed by AWS |
| Policy-as-Code | Sentinel (paid) / OPA | CrossGuard (built-in) | CloudFormation Guard |
| Audit Trail | Via CI/CD + state history | Built-in with managed service | CloudTrail integration |
| Secrets Handling | Marked sensitive, not encrypted in state | Encrypted by default in state | Dynamic references to SSM/Secrets Manager |
| Drift Detection | Manual (terraform plan) |
Manual (pulumi preview) |
Built-in drift detection |
| Learning Curve | Moderate (HCL is simple but limited) | Low if you know the language | High (verbose, error messages are cryptic) |
My honest take: Terraform is still the pragmatic default for most FinTech teams because of the ecosystem and hiring pool. But Pulumi's native secrets encryption is a genuine advantage when you're dealing with PCI scope — Terraform's state file will contain sensitive values in plaintext unless you add encryption at the backend level. CloudFormation is fine if you're all-in on AWS and want drift detection without extra tooling, but the debugging experience is painful.
The IaC Pipeline for Regulated Environments
A standard terraform plan && terraform apply workflow won't pass muster with auditors. You need a pipeline that enforces separation of duties, runs policy checks before anything touches production, and produces an immutable audit log. Here's the flow I've settled on after a few iterations:
The "Approve" step is the key differentiator. In our setup, the plan output is posted as a PR comment, a second engineer reviews it, and only after explicit approval does the pipeline proceed to apply. The entire chain — commit SHA, plan output, approver identity, apply logs — gets shipped to an immutable audit store. When the QSA asks "who approved this network change and when?", you can answer in seconds.
State Management Pitfalls
State files are where FinTech IaC gets dangerous. I've seen three recurring problems that catch teams off guard:
Secrets in state. Terraform stores resource attributes in state, including things like RDS master passwords, API keys, and private keys. If your state backend is an S3 bucket without encryption, you've just created an unencrypted store of production secrets. At minimum, enable SSE-KMS on your state bucket and restrict access with a tight IAM policy. Better yet, avoid putting secrets into Terraform at all — use aws_secretsmanager_secret with a reference, not a literal value.
State locking failures. DynamoDB-based locking for Terraform state works well until it doesn't. I've had a CI runner crash mid-apply, leaving a stale lock that blocked the entire team for an hour during an incident. Have a documented runbook for force-unlocking state, and make sure more than one person has the IAM permissions to do it.
Drift. Someone logs into the console and tweaks a security group "just for testing." Now your state and reality are out of sync. In FinTech, drift isn't just annoying — it's a compliance gap. Run terraform plan on a schedule (I use a nightly CI job) and alert on any detected drift. CloudFormation's built-in drift detection is one of its genuine advantages here.
Tip: Never store your Terraform state in the same AWS account as your production workloads. Use a dedicated "infrastructure" account with cross-account access. This limits blast radius and makes it much easier to audit who touched state files.
Secrets Management Integration
The worst pattern I see in FinTech IaC is secrets hardcoded in terraform.tfvars files, sometimes even committed to Git. The fix isn't complicated, but it requires discipline.
For AWS-native stacks, I use AWS Secrets Manager with Terraform's data source to read secrets at plan time, never storing them in code or state. For multi-cloud or hybrid setups, HashiCorp Vault is the standard — the Vault provider for Terraform lets you read secrets dynamically, and Vault's audit log gives you a record of every secret access.
# Read a database password from Vault at plan time
data "vault_generic_secret" "db_creds" {
path = "secret/production/payment-db"
}
resource "aws_db_instance" "payments" {
engine = "postgres"
instance_class = "db.r6g.xlarge"
master_username = data.vault_generic_secret.db_creds.data["username"]
master_password = data.vault_generic_secret.db_creds.data["password"]
storage_encrypted = true
# ...
}
The critical detail: make sure your CI runner authenticates to Vault using a short-lived token (AppRole with a TTL, not a long-lived root token). I've seen a payments company store a Vault root token in a GitHub Actions secret. That's trading one problem for another.
Policy-as-Code for Compliance Guardrails
This is where IaC goes from "nice to have" to "audit-ready." Policy-as-code tools let you define rules like "all S3 buckets must have encryption enabled" or "no security group may allow 0.0.0.0/0 on port 22" and enforce them automatically before any infrastructure is created.
I've used both OPA (Open Policy Agent) with conftest and HashiCorp Sentinel. OPA is open-source and works with any IaC tool — you write policies in Rego and run them against your Terraform plan JSON. Sentinel is tightly integrated with Terraform Cloud/Enterprise and has a friendlier syntax, but it's a paid feature.
A practical starting set of policies for a PCI-scoped environment:
- All storage resources must have encryption at rest enabled
- No public ingress rules on security groups in the cardholder data environment
- All RDS instances must have automated backups with a minimum 7-day retention
- VPC flow logs must be enabled on all VPCs
- IAM policies must not use wildcard (
*) actions on sensitive services
# OPA policy: deny unencrypted S3 buckets
package terraform.s3
deny[msg] {
resource := input.planned_values.root_module.resources[_]
resource.type == "aws_s3_bucket"
not has_encryption(resource)
msg := sprintf("S3 bucket '%s' must have encryption enabled", [resource.name])
}
has_encryption(resource) {
resource.values.server_side_encryption_configuration
}
Warning: Policy-as-code is only as good as your policy coverage. Start with the controls your QSA actually checks, not a theoretical list of everything that could go wrong. Overly aggressive policies will slow your team to a crawl and get bypassed with exceptions — which defeats the purpose.
Multi-Environment Strategy
Most FinTech companies need at least three environments: sandbox (for developers to experiment), staging (for integration testing and pre-production validation), and production (the PCI-scoped environment with full compliance controls). The mistake I see is treating them identically or, worse, managing them with copy-pasted Terraform directories.
What works: a single set of modules parameterized by environment. Use Terraform workspaces or, better, separate state files per environment with a shared module registry. Each environment gets its own variable file that controls compliance-relevant settings:
- Sandbox: Relaxed policies, self-service deploys, short-lived resources with auto-cleanup. No PCI scope.
- Staging: Production-like policies enforced, but with a faster approval cycle. Useful for testing policy changes before they hit production.
- Production: Full policy enforcement, mandatory peer review, immutable audit logs, restricted IAM access. This is your PCI CDE.
The key insight: your staging environment should run the same policy checks as production. If a policy only fires in production, you'll find out about violations at the worst possible time.
Real Gotchas from Production
I'll close with a few things that bit me in production that you won't find in the docs:
- Terraform provider updates can break PCI controls. A minor version bump in the AWS provider changed the default encryption behavior on an EBS volume type. Our nightly drift check caught it, but only because we had one. Pin your provider versions and review changelogs before upgrading.
- Import is not your friend. When you
terraform importa manually-created resource, Terraform doesn't generate the config for you. You get a state entry with no corresponding HCL, and the nextplanwill try to delete or modify things you didn't expect. Always write the config first, then import. - Module versioning matters more than you think. If your payment processing module is referenced as
source = "../modules/payments", a change in that module affects every environment simultaneously. Use a versioned module registry so production can stay on v1.2.3 while staging tests v1.3.0. - Terraform Cloud's run queue is a bottleneck during incidents. When three teams are trying to push hotfixes at 2 AM, a serialized run queue means someone is waiting. Have a break-glass procedure for emergency applies that bypasses the queue but still logs everything.
- Don't forget about Terraform state file size. A monolithic state file for an entire payment platform will eventually make plans take 10+ minutes. Split your state by domain boundary — networking, compute, databases, application — early, before it becomes painful to refactor.
References
- Terraform Documentation — HashiCorp Developer
- Pulumi Documentation — Pulumi
- AWS CloudFormation Documentation — Amazon Web Services
- HashiCorp Vault Documentation — HashiCorp Developer
- Open Policy Agent Documentation — OPA
- HashiCorp Sentinel Documentation — HashiCorp Developer
- AWS Secrets Manager Documentation — Amazon Web Services
- PCI DSS Document Library — PCI Security Standards Council
Disclaimer
The opinions and recommendations in this article are based on my personal experience and do not constitute legal or compliance advice. PCI DSS requirements vary by merchant level and acquirer. Always consult your Qualified Security Assessor (QSA) and legal counsel before making compliance-related infrastructure decisions. Tool capabilities and pricing may have changed since the time of writing.