Secrets Management for Payment Infrastructure — Beyond Environment Variables
Your .env file is one misconfigured deploy away from handing attackers your Stripe keys. I've migrated three payment platforms off environment variables, and the patterns I landed on have saved us from at least two near-misses. Here's the playbook.
Environment Variables: A Ticking Time Bomb
Every payment team I've joined started the same way: Stripe keys in .env, database credentials in docker-compose.yml, and maybe an encryption key hardcoded somewhere in a config file "just for now." It's the path of least resistance, and it works — until it doesn't.
The problem isn't that environment variables are inherently evil. It's that they leak in ways you don't expect:
docker inspectdumps every env var in plaintext. Anyone with Docker socket access sees your Stripe secret key./proc/<pid>/environon Linux exposes the environment of any process you can read. A container escape gives an attacker everything.- Crash dumps and error reporting tools routinely capture environment state. I've seen Sentry reports with
PAYMENT_API_SECRETsitting right there in the breadcrumbs. - CI/CD logs. One
printenvin a debug step, onedocker-compose configin a pipeline log, and your keys are in build artifacts forever.
A colleague once ran env | sort during a live debugging session on a shared screen. Thirty engineers saw the production Adyen API key scroll past. We rotated it within the hour, but the panic was real. That was the week we started migrating to Vault.
PCI DSS Requirement 3.7: Cryptographic key management procedures must include secure key storage, defined cryptoperiods, and retirement processes. Environment variables satisfy none of these requirements.
The Secrets Hierarchy
Not all secrets are equal. In a payment system, you're typically managing four tiers of sensitive material, each with different rotation requirements and blast radius if compromised:
Each tier demands different handling. Encryption keys should live in an HSM or KMS and never be extractable. Payment API keys need automated rotation. Database credentials can use short-lived dynamic secrets. Internal tokens should be ephemeral, issued per-session.
Comparing Secrets Management Solutions
I've deployed all three major solutions in production payment environments. Here's an honest comparison based on what actually matters when you're processing transactions.
My recommendation: if you're all-in on AWS, Secrets Manager is the pragmatic choice — it integrates natively with ECS, Lambda, and RDS. If you're multi-cloud or on-prem, Vault is worth the operational overhead. I've run Vault clusters for two payment platforms, and the dynamic secrets engine alone justified the investment. GCP Secret Manager is solid if you're in that ecosystem, but its rotation story requires more custom glue.
Automatic Rotation for Payment API Keys
Static API keys are a liability. The longer a key lives, the more places it's been cached, logged, or copied. For payment gateways, I target 90-day rotation at most — and for high-risk keys like those with charge permissions, 30 days.
The trick is making rotation seamless. You can't have a window where the old key is dead and the new one isn't deployed yet. That's a payment outage. Here's the dual-key pattern I use:
Step 3 is critical. Most payment gateways support having two active API keys simultaneously. Stripe calls them "rolling keys," Adyen supports multiple API credentials per merchant account. You generate the new key, deploy it, confirm it's working, then kill the old one. Zero downtime.
# Example: Vault-managed rotation with AWS Secrets Manager
# rotation Lambda (simplified)
import boto3
import stripe
def rotate_stripe_key(secret_arn):
sm = boto3.client('secretsmanager')
# Get current secret
current = sm.get_secret_value(SecretId=secret_arn)
# Generate new restricted key via Stripe API
new_key = stripe.api_keys.create(
name=f"production-{datetime.now().isoformat()}",
permissions=["charges:write", "refunds:write"]
)
# Store new key as pending version
sm.put_secret_value(
SecretId=secret_arn,
SecretString=new_key.secret,
VersionStages=["AWSPENDING"]
)
# After verification, promote pending to current
sm.update_secret_version_stage(
SecretId=secret_arn,
VersionStage="AWSCURRENT",
MoveToVersionId=new_version_id
)
Integration Patterns
Getting secrets out of a vault and into your application is where theory meets reality. I've used three patterns, each with different tradeoffs.
Sidecar injection (Kubernetes)
A Vault Agent sidecar runs alongside your payment service, fetches secrets, and writes them to a shared volume. Your app reads files — no SDK dependency, no Vault awareness in application code. This is my preferred pattern for Kubernetes deployments because it keeps the application completely decoupled from the secrets backend.
# Vault Agent sidecar annotation in K8s
metadata:
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "payment-service"
vault.hashicorp.com/agent-inject-secret-stripe: "secret/data/stripe"
vault.hashicorp.com/agent-inject-template-stripe: |
{{- with secret "secret/data/stripe" -}}
STRIPE_SECRET_KEY={{ .Data.data.api_key }}
{{- end -}}
Init containers
Similar to sidecars, but the secret fetch happens once at startup. Simpler, but you lose automatic renewal. Fine for secrets that don't rotate mid-deployment — like database connection strings that change only during planned rotations.
SDK-based retrieval
Your application calls the secrets manager directly at runtime. More coupling, but you get fine-grained control: you can cache secrets in memory, handle rotation events, and implement circuit breakers if the secrets backend is unavailable.
# Go — SDK-based retrieval with caching
func (s *SecretStore) GetPaymentKey(ctx context.Context) (string, error) {
s.mu.RLock()
if s.cached != "" && time.Since(s.fetchedAt) < 5*time.Minute {
defer s.mu.RUnlock()
return s.cached, nil
}
s.mu.RUnlock()
result, err := s.sm.GetSecretValue(ctx, &secretsmanager.GetSecretValueInput{
SecretId: aws.String("payment/stripe-key"),
})
if err != nil {
return "", fmt.Errorf("failed to fetch secret: %w", err)
}
s.mu.Lock()
s.cached = *result.SecretString
s.fetchedAt = time.Now()
s.mu.Unlock()
return s.cached, nil
}
Operational tip: Whichever pattern you choose, always implement a fallback. If Vault is down, your payment service shouldn't crash on startup. Cache the last-known-good secret in encrypted memory and alert on staleness. A 6-hour-old API key is better than a hard outage.
Audit Logging — PCI DSS Requirement 10
PCI DSS Requirement 10 mandates that you track and monitor all access to network resources and cardholder data. For secrets, that means logging every read, every write, every failed access attempt. "Who accessed the Stripe production key at 3 AM on Saturday?" is a question you need to answer in minutes, not days.
Vault's audit backend is excellent here — it logs every single API request with the accessor identity, timestamp, path, and operation. AWS Secrets Manager integrates with CloudTrail, which gives you the same visibility but through AWS's logging pipeline.
# Vault audit log entry (simplified JSON)
{
"type": "response",
"time": "2026-04-08T14:23:01Z",
"auth": {
"display_name": "kubernetes-payment-svc",
"policies": ["payment-read"],
"token_type": "service"
},
"request": {
"path": "secret/data/stripe/production",
"operation": "read",
"remote_address": "10.0.42.17"
},
"response": {
"data": { "keys": ["api_key"] }
}
}
The things I watch for in audit logs:
- Access from unexpected IP ranges or service identities
- Reads of production secrets from non-production namespaces
- Spikes in secret access frequency (could indicate credential stuffing or a compromised service looping)
- Any
writeordeleteoperation on payment secrets outside of scheduled rotation windows - Failed authentication attempts against the secrets backend itself
We pipe Vault audit logs into Datadog and have alerts on all of the above. During our last PCI audit, the QSA spent about ten minutes on secrets management. He pulled up the audit trail, confirmed rotation history, checked access policies, and moved on. That's the goal — make the auditor's job boring.
When a Payment API Key Leaks
It happened to a team I was consulting for in 2024. A developer committed a .env file to a public GitHub repo. The file contained a live Stripe secret key with full charge permissions. GitHub's secret scanning caught it and notified Stripe, who emailed the team — but not before the key had been exposed for about 40 minutes.
Here's what unfolded:
- Within 18 minutes of the commit, automated scanners (not Stripe's — third-party bots that crawl GitHub) had already tested the key.
- Three fraudulent charges totaling $2,400 were attempted. Stripe's fraud detection blocked two of them. One went through.
- The team rotated the key manually — which took 25 minutes because nobody had documented the rotation procedure, and the key was hardcoded in two services that needed redeployment.
- Post-incident, they discovered the same key was also in a Docker image layer on their private registry.
docker historyshowed it in aRUNcommand from six months earlier.
The lessons were clear: if that key had been in Vault with automated rotation, the blast radius would have been near zero. The key would have been short-lived, the rotation would have been a single API call, and there would have been nothing to commit to Git in the first place.
Getting Started — The Migration Path
You don't have to migrate everything at once. Here's the order I recommend, based on risk and effort:
- Payment gateway keys first. These have the highest blast radius. Move them to Secrets Manager or Vault, set up 90-day rotation, and remove them from all env files and CI variables.
- Database credentials next. If you're on AWS, RDS integration with Secrets Manager makes this almost turnkey. For self-managed databases, Vault's database secrets engine generates short-lived credentials on demand.
- Encryption keys into KMS. Stop managing raw encryption keys. Use AWS KMS, GCP Cloud KMS, or Vault Transit for envelope encryption. The key material never leaves the HSM.
- Internal service tokens last. Move to short-lived tokens issued by your identity provider or Vault's AppRole auth method. This is the most work but eliminates the entire class of "stolen service account token" attacks.
Each step is independently valuable. Even if you only get through step one, you've dramatically reduced your risk profile. Don't let perfect be the enemy of deployed.
References
- HashiCorp Vault Documentation
- AWS Secrets Manager User Guide
- Google Cloud Secret Manager Documentation
- PCI Security Standards Council — Document Library
- Stripe — API Keys Documentation
- OWASP Secrets Management Cheat Sheet
- GitHub Secret Scanning Documentation
Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Security recommendations should be validated against your specific compliance requirements and threat model. The incident described is based on a real event with details altered to protect the parties involved.