April 8, 2026 10 min read

Secrets Management for Payment Infrastructure — Beyond Environment Variables

Your .env file is one misconfigured deploy away from handing attackers your Stripe keys. I've migrated three payment platforms off environment variables, and the patterns I landed on have saved us from at least two near-misses. Here's the playbook.

78%
of breaches involve compromised credentials
90 days
max recommended rotation cycle for payment API keys
$4.5M
average cost of a data breach in financial services (2025)

Environment Variables: A Ticking Time Bomb

Every payment team I've joined started the same way: Stripe keys in .env, database credentials in docker-compose.yml, and maybe an encryption key hardcoded somewhere in a config file "just for now." It's the path of least resistance, and it works — until it doesn't.

The problem isn't that environment variables are inherently evil. It's that they leak in ways you don't expect:

  • docker inspect dumps every env var in plaintext. Anyone with Docker socket access sees your Stripe secret key.
  • /proc/<pid>/environ on Linux exposes the environment of any process you can read. A container escape gives an attacker everything.
  • Crash dumps and error reporting tools routinely capture environment state. I've seen Sentry reports with PAYMENT_API_SECRET sitting right there in the breadcrumbs.
  • CI/CD logs. One printenv in a debug step, one docker-compose config in a pipeline log, and your keys are in build artifacts forever.

A colleague once ran env | sort during a live debugging session on a shared screen. Thirty engineers saw the production Adyen API key scroll past. We rotated it within the hour, but the panic was real. That was the week we started migrating to Vault.

PCI DSS Requirement 3.7: Cryptographic key management procedures must include secure key storage, defined cryptoperiods, and retirement processes. Environment variables satisfy none of these requirements.

The Secrets Hierarchy

Not all secrets are equal. In a payment system, you're typically managing four tiers of sensitive material, each with different rotation requirements and blast radius if compromised:

Secrets Hierarchy — Payment Systems
1
Encryption Keys & Signing Certificates
Card data encryption, webhook signature verification, mTLS certs. Compromise = full data breach.
2
Payment Gateway API Keys
Stripe, Adyen, Braintree secret keys. Compromise = unauthorized charges, refund fraud.
3
Database Credentials
PostgreSQL, Redis, message queue auth. Compromise = data exfiltration, transaction manipulation.
4
Internal Service Tokens
Service-to-service auth, JWT signing keys, internal API tokens. Compromise = lateral movement.

Each tier demands different handling. Encryption keys should live in an HSM or KMS and never be extractable. Payment API keys need automated rotation. Database credentials can use short-lived dynamic secrets. Internal tokens should be ephemeral, issued per-session.

Comparing Secrets Management Solutions

I've deployed all three major solutions in production payment environments. Here's an honest comparison based on what actually matters when you're processing transactions.

Feature HashiCorp Vault AWS Secrets Manager GCP Secret Manager
Dynamic secrets Yes — native Lambda rotation Cloud Functions
Auto rotation Built-in TTLs Native (30/60/90d) Pub/Sub triggers
Audit logging Detailed CloudTrail Cloud Audit Logs
Multi-cloud Yes AWS only GCP only
Ops overhead High — self-managed Low — managed Low — managed
Cost at scale Free (OSS) / Enterprise $$ $0.40/secret/month + API calls $0.06/version/month + API calls

My recommendation: if you're all-in on AWS, Secrets Manager is the pragmatic choice — it integrates natively with ECS, Lambda, and RDS. If you're multi-cloud or on-prem, Vault is worth the operational overhead. I've run Vault clusters for two payment platforms, and the dynamic secrets engine alone justified the investment. GCP Secret Manager is solid if you're in that ecosystem, but its rotation story requires more custom glue.

Automatic Rotation for Payment API Keys

Static API keys are a liability. The longer a key lives, the more places it's been cached, logged, or copied. For payment gateways, I target 90-day rotation at most — and for high-risk keys like those with charge permissions, 30 days.

The trick is making rotation seamless. You can't have a window where the old key is dead and the new one isn't deployed yet. That's a payment outage. Here's the dual-key pattern I use:

API Key Rotation Flow — Dual-Key Pattern
1. Generate new key via gateway API
2. Store new key in Vault / SM
3. Both keys active (overlap window)
4. Roll services to pick up new key
5. Verify traffic on new key
6. Revoke old key

Step 3 is critical. Most payment gateways support having two active API keys simultaneously. Stripe calls them "rolling keys," Adyen supports multiple API credentials per merchant account. You generate the new key, deploy it, confirm it's working, then kill the old one. Zero downtime.

# Example: Vault-managed rotation with AWS Secrets Manager
# rotation Lambda (simplified)
import boto3
import stripe

def rotate_stripe_key(secret_arn):
    sm = boto3.client('secretsmanager')
    
    # Get current secret
    current = sm.get_secret_value(SecretId=secret_arn)
    
    # Generate new restricted key via Stripe API
    new_key = stripe.api_keys.create(
        name=f"production-{datetime.now().isoformat()}",
        permissions=["charges:write", "refunds:write"]
    )
    
    # Store new key as pending version
    sm.put_secret_value(
        SecretId=secret_arn,
        SecretString=new_key.secret,
        VersionStages=["AWSPENDING"]
    )
    
    # After verification, promote pending to current
    sm.update_secret_version_stage(
        SecretId=secret_arn,
        VersionStage="AWSCURRENT",
        MoveToVersionId=new_version_id
    )

Integration Patterns

Getting secrets out of a vault and into your application is where theory meets reality. I've used three patterns, each with different tradeoffs.

Sidecar injection (Kubernetes)

A Vault Agent sidecar runs alongside your payment service, fetches secrets, and writes them to a shared volume. Your app reads files — no SDK dependency, no Vault awareness in application code. This is my preferred pattern for Kubernetes deployments because it keeps the application completely decoupled from the secrets backend.

# Vault Agent sidecar annotation in K8s
metadata:
  annotations:
    vault.hashicorp.com/agent-inject: "true"
    vault.hashicorp.com/role: "payment-service"
    vault.hashicorp.com/agent-inject-secret-stripe: "secret/data/stripe"
    vault.hashicorp.com/agent-inject-template-stripe: |
      {{- with secret "secret/data/stripe" -}}
      STRIPE_SECRET_KEY={{ .Data.data.api_key }}
      {{- end -}}

Init containers

Similar to sidecars, but the secret fetch happens once at startup. Simpler, but you lose automatic renewal. Fine for secrets that don't rotate mid-deployment — like database connection strings that change only during planned rotations.

SDK-based retrieval

Your application calls the secrets manager directly at runtime. More coupling, but you get fine-grained control: you can cache secrets in memory, handle rotation events, and implement circuit breakers if the secrets backend is unavailable.

# Go — SDK-based retrieval with caching
func (s *SecretStore) GetPaymentKey(ctx context.Context) (string, error) {
    s.mu.RLock()
    if s.cached != "" && time.Since(s.fetchedAt) < 5*time.Minute {
        defer s.mu.RUnlock()
        return s.cached, nil
    }
    s.mu.RUnlock()

    result, err := s.sm.GetSecretValue(ctx, &secretsmanager.GetSecretValueInput{
        SecretId: aws.String("payment/stripe-key"),
    })
    if err != nil {
        return "", fmt.Errorf("failed to fetch secret: %w", err)
    }

    s.mu.Lock()
    s.cached = *result.SecretString
    s.fetchedAt = time.Now()
    s.mu.Unlock()

    return s.cached, nil
}

Operational tip: Whichever pattern you choose, always implement a fallback. If Vault is down, your payment service shouldn't crash on startup. Cache the last-known-good secret in encrypted memory and alert on staleness. A 6-hour-old API key is better than a hard outage.

Audit Logging — PCI DSS Requirement 10

PCI DSS Requirement 10 mandates that you track and monitor all access to network resources and cardholder data. For secrets, that means logging every read, every write, every failed access attempt. "Who accessed the Stripe production key at 3 AM on Saturday?" is a question you need to answer in minutes, not days.

Vault's audit backend is excellent here — it logs every single API request with the accessor identity, timestamp, path, and operation. AWS Secrets Manager integrates with CloudTrail, which gives you the same visibility but through AWS's logging pipeline.

# Vault audit log entry (simplified JSON)
{
  "type": "response",
  "time": "2026-04-08T14:23:01Z",
  "auth": {
    "display_name": "kubernetes-payment-svc",
    "policies": ["payment-read"],
    "token_type": "service"
  },
  "request": {
    "path": "secret/data/stripe/production",
    "operation": "read",
    "remote_address": "10.0.42.17"
  },
  "response": {
    "data": { "keys": ["api_key"] }
  }
}

The things I watch for in audit logs:

  • Access from unexpected IP ranges or service identities
  • Reads of production secrets from non-production namespaces
  • Spikes in secret access frequency (could indicate credential stuffing or a compromised service looping)
  • Any write or delete operation on payment secrets outside of scheduled rotation windows
  • Failed authentication attempts against the secrets backend itself

We pipe Vault audit logs into Datadog and have alerts on all of the above. During our last PCI audit, the QSA spent about ten minutes on secrets management. He pulled up the audit trail, confirmed rotation history, checked access policies, and moved on. That's the goal — make the auditor's job boring.

When a Payment API Key Leaks

It happened to a team I was consulting for in 2024. A developer committed a .env file to a public GitHub repo. The file contained a live Stripe secret key with full charge permissions. GitHub's secret scanning caught it and notified Stripe, who emailed the team — but not before the key had been exposed for about 40 minutes.

Here's what unfolded:

  1. Within 18 minutes of the commit, automated scanners (not Stripe's — third-party bots that crawl GitHub) had already tested the key.
  2. Three fraudulent charges totaling $2,400 were attempted. Stripe's fraud detection blocked two of them. One went through.
  3. The team rotated the key manually — which took 25 minutes because nobody had documented the rotation procedure, and the key was hardcoded in two services that needed redeployment.
  4. Post-incident, they discovered the same key was also in a Docker image layer on their private registry. docker history showed it in a RUN command from six months earlier.
Incident Timeline — API Key Exposure
T+0 min — .env committed to public repo
T+8 min — GitHub secret scanning detects key
T+18 min — Third-party bots test the key, fraudulent charges attempted
T+22 min — Stripe notifies team via email
T+47 min — Key rotated, services redeployed

The lessons were clear: if that key had been in Vault with automated rotation, the blast radius would have been near zero. The key would have been short-lived, the rotation would have been a single API call, and there would have been nothing to commit to Git in the first place.

Getting Started — The Migration Path

You don't have to migrate everything at once. Here's the order I recommend, based on risk and effort:

  1. Payment gateway keys first. These have the highest blast radius. Move them to Secrets Manager or Vault, set up 90-day rotation, and remove them from all env files and CI variables.
  2. Database credentials next. If you're on AWS, RDS integration with Secrets Manager makes this almost turnkey. For self-managed databases, Vault's database secrets engine generates short-lived credentials on demand.
  3. Encryption keys into KMS. Stop managing raw encryption keys. Use AWS KMS, GCP Cloud KMS, or Vault Transit for envelope encryption. The key material never leaves the HSM.
  4. Internal service tokens last. Move to short-lived tokens issued by your identity provider or Vault's AppRole auth method. This is the most work but eliminates the entire class of "stolen service account token" attacks.

Each step is independently valuable. Even if you only get through step one, you've dramatically reduced your risk profile. Don't let perfect be the enemy of deployed.

References

Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Security recommendations should be validated against your specific compliance requirements and threat model. The incident described is based on a real event with details altered to protect the parties involved.