A couple of years ago, I was staring at a Grafana dashboard watching our payment gateway's p99 latency climb past 800ms during a flash sale. The database was drowning — every single authorization request was hitting Postgres for merchant config, BIN lookups, FX rates, and fraud rule sets. We were doing six to eight DB queries per transaction, and at 1,500 TPS, that math gets ugly fast.
Adding Redis changed everything. But here's the thing nobody tells you: caching in payment systems is not the same as caching product pages. Get it wrong and you're not showing a stale price — you're approving transactions with yesterday's exchange rate or routing payments to a deactivated processor. The stakes are different, and the strategy has to be too.
Why Caching Matters Here
Payment processing is latency-sensitive in a way most applications aren't. Card networks like Visa give you roughly 2-3 seconds to respond to an authorization request. That sounds generous until you factor in network hops to the acquirer, fraud checks, 3DS challenges, and the actual issuer response. Your internal processing budget is maybe 200-400ms. Every millisecond you spend on a database round-trip is a millisecond you can't spend on business logic.
Beyond latency, there's cost. Every DB query at scale means bigger RDS instances, more read replicas, higher IOPS bills. When I profiled our system, roughly 70% of our database reads were for data that changed less than once per hour — merchant configurations, BIN ranges, currency pairs. That's textbook cacheable.
The Cache-Aside Pattern: Your Bread and Butter
For most payment data, cache-aside (also called lazy loading) is the right starting point. The application checks Redis first, falls back to the database on a miss, and populates the cache before returning. It's simple, and simple is what you want when money is moving.
Here's what the Go implementation looks like with go-redis:
func (s *MerchantService) GetConfig(ctx context.Context, merchantID string) (*MerchantConfig, error) {
key := fmt.Sprintf("merchant:config:%s", merchantID)
// Try cache first
data, err := s.redis.Get(ctx, key).Bytes()
if err == nil {
var cfg MerchantConfig
if err := json.Unmarshal(data, &cfg); err == nil {
return &cfg, nil
}
}
// Cache miss — hit the database
cfg, err := s.db.QueryMerchantConfig(ctx, merchantID)
if err != nil {
return nil, fmt.Errorf("query merchant config: %w", err)
}
// Store in cache with TTL
encoded, _ := json.Marshal(cfg)
s.redis.Set(ctx, key, encoded, 15*time.Minute)
return cfg, nil
}
Tip: Always set a TTL, even on data you plan to invalidate explicitly. TTLs are your safety net — if your invalidation logic has a bug, stale data will still expire. I use 15 minutes for merchant configs and 5 minutes for FX rates.
What to Cache vs. What NOT to Cache
This is where payment systems diverge from typical web apps. The rule is straightforward: cache reference data, never cache transaction state. If a piece of data is used to make a decision, it's probably cacheable. If it is the decision (or the result of one), keep it in the database.
Warning: Never cache card PANs or sensitive authentication data in Redis unless your Redis deployment is PCI DSS compliant with encryption at rest and in transit. Even then, think twice. Use tokenization instead.
Redis Data Structures for Payment Use Cases
One of the things I love about Redis is that it's not just a key-value store. Picking the right data structure can save you a lot of application-level complexity.
Strings: Session Tokens and Simple Lookups
For API session tokens and single-value lookups like "is this merchant active?", plain strings with TTL are perfect. The SET key value EX seconds NX pattern is also great for distributed locks during settlement runs.
Hashes: Merchant Configurations
Merchant configs have multiple fields — processor ID, MCC code, settlement currency, fee structure. Instead of serializing the whole thing into a JSON string, I use Redis Hashes. This lets you read or update individual fields without fetching the entire object.
// Store merchant config as a hash
func (s *MerchantService) CacheMerchantConfig(ctx context.Context, m *MerchantConfig) error {
key := fmt.Sprintf("merchant:%s", m.ID)
return s.redis.HSet(ctx, key,
"processor_id", m.ProcessorID,
"mcc", m.MCC,
"currency", m.SettlementCurrency,
"max_amount", m.MaxTransactionAmount,
"active", m.Active,
).Err()
}
// Read just the fields you need for routing
func (s *MerchantService) GetRoutingInfo(ctx context.Context, merchantID string) (string, string, error) {
key := fmt.Sprintf("merchant:%s", merchantID)
vals, err := s.redis.HMGet(ctx, key, "processor_id", "currency").Result()
if err != nil {
return "", "", err
}
return vals[0].(string), vals[1].(string), nil
}
Sorted Sets: Rate Limiting
Sorted Sets with timestamps as scores are my go-to for sliding window rate limiting. Each member is a unique request ID, scored by its Unix timestamp in milliseconds. To check the rate, you ZRANGEBYSCORE within the window and count. To clean up, ZREMRANGEBYSCORE anything older than the window.
Sets: Transaction Deduplication
Duplicate payment requests are a real problem — network retries, impatient users clicking "Pay" three times. I use Redis Sets with SADD to track idempotency keys within a short window. If SADD returns 0, the key already exists and you return the original response. Simple, atomic, fast.
// Dedup check using SET with NX (returns false if key already exists)
func (s *PaymentService) CheckAndSetDedup(ctx context.Context, idempotencyKey string) (bool, error) {
set, err := s.redis.SetNX(ctx,
fmt.Sprintf("dedup:%s", idempotencyKey),
"1",
24*time.Hour,
).Result()
return set, err // set=true means first time, set=false means duplicate
}
Cache Invalidation: The Hard Part
Phil Karlton's famous quote about cache invalidation being one of the two hard things in computer science hits different when stale data means financial discrepancies. Here's what I've settled on after a few painful lessons.
TTL-Based Expiry
The simplest approach and your baseline. Every cached value gets a TTL. For FX rates, I use 60-90 seconds. For merchant configs, 15 minutes. For BIN tables, 24 hours (they change maybe once a quarter). The key insight: your TTL should reflect how much staleness your business can tolerate, not how often the data actually changes.
Event-Driven Invalidation
When a merchant updates their configuration through the admin dashboard, we publish an event to a message queue. A consumer picks it up and deletes the relevant cache keys. This gives us near-instant invalidation without coupling the admin service to the cache layer.
// On merchant config update, publish invalidation event
func (s *AdminService) UpdateMerchantConfig(ctx context.Context, cfg *MerchantConfig) error {
if err := s.db.SaveMerchantConfig(ctx, cfg); err != nil {
return err
}
// Invalidate cache — delete, don't update
s.redis.Del(ctx, fmt.Sprintf("merchant:%s", cfg.ID))
s.redis.Del(ctx, fmt.Sprintf("merchant:config:%s", cfg.ID))
return nil
}
Tip: Always delete cache keys on write rather than updating them. Delete-then-lazy-load avoids race conditions where a concurrent read might overwrite your fresh data with stale data it just fetched from the DB.
Write-Through vs. Write-Behind
Write-through (update cache and DB synchronously) works for low-write-volume data like merchant configs. Write-behind (update cache immediately, async write to DB) is tempting for performance but dangerous in payments — if Redis dies before the DB write, you've lost data. I only use write-behind for non-critical analytics counters, never for anything financial.
Redis Cluster vs. Sentinel for HA
For payment systems, downtime isn't an option. You need high availability, but the choice between Redis Sentinel and Redis Cluster depends on your scale.
Redis Sentinel gives you automatic failover with a primary-replica setup. It's simpler to operate and works well up to about 50GB of data and 100K ops/sec. For most payment gateways processing under 5,000 TPS, Sentinel is plenty. The failover takes 10-30 seconds, which means you need your application to handle Redis being temporarily unavailable — more on that below.
Redis Cluster shards data across multiple nodes, giving you horizontal scalability. If you're caching BIN tables (which can be large) alongside merchant configs and rate limiting data, and you're pushing past what a single node can handle, Cluster is the way to go. The trade-off is operational complexity — resharding, slot migration, and multi-key operations that span slots will bite you if you're not careful.
Warning: In Redis Cluster, multi-key operations like MGET only work if all keys hash to the same slot. Use hash tags (e.g., {merchant:123}:config and {merchant:123}:limits) to co-locate related keys.
The Gotchas That Will Ruin Your Week
Thundering Herd / Cache Stampede
When a popular cache key expires, every concurrent request sees a miss and hits the database simultaneously. At 1,500 TPS, if your merchant config key expires, you might get 200 concurrent DB queries for the same row. I've seen this take down a read replica.
The fix: use a distributed lock (or a simple SETNX) so only one request fetches from the DB while others wait or get a slightly stale value. In Go:
func (s *MerchantService) GetConfigWithStampedeLock(ctx context.Context, merchantID string) (*MerchantConfig, error) {
key := fmt.Sprintf("merchant:config:%s", merchantID)
lockKey := key + ":lock"
data, err := s.redis.Get(ctx, key).Bytes()
if err == nil {
var cfg MerchantConfig
json.Unmarshal(data, &cfg)
return &cfg, nil
}
// Try to acquire lock (5s TTL to prevent deadlocks)
acquired, _ := s.redis.SetNX(ctx, lockKey, "1", 5*time.Second).Result()
if acquired {
defer s.redis.Del(ctx, lockKey)
// Fetch from DB and populate cache
cfg, err := s.db.QueryMerchantConfig(ctx, merchantID)
if err != nil {
return nil, err
}
encoded, _ := json.Marshal(cfg)
s.redis.Set(ctx, key, encoded, 15*time.Minute)
return cfg, nil
}
// Another goroutine is fetching — wait briefly and retry cache
time.Sleep(50 * time.Millisecond)
data, err = s.redis.Get(ctx, key).Bytes()
if err == nil {
var cfg MerchantConfig
json.Unmarshal(data, &cfg)
return &cfg, nil
}
// Fallback to DB if lock holder failed
return s.db.QueryMerchantConfig(ctx, merchantID)
}
Stale FX Rates and Settlement Mismatches
This one cost us real money. We cached FX rates with a 5-minute TTL, which seemed reasonable. But during a volatile trading session, EUR/USD moved 0.3% within those 5 minutes. We authorized transactions at one rate and settled at another. The mismatch across thousands of transactions added up to a non-trivial loss.
The fix was two-fold: drop the TTL to 60 seconds, and add a version check at settlement time. If the rate used at authorization doesn't match the current rate within a configurable tolerance, the transaction gets flagged for manual review.
Redis Down? Don't Crash.
Your payment system must work when Redis is unavailable. This means every cache read needs a fallback to the database. I wrap all Redis calls with a circuit breaker — after 5 consecutive failures, we skip Redis entirely for 30 seconds and go straight to the DB. Latency increases, but transactions keep flowing.
// Simplified circuit breaker pattern
func (s *CacheService) Get(ctx context.Context, key string) ([]byte, error) {
if s.circuitOpen.Load() {
return nil, ErrCircuitOpen // skip Redis, caller falls back to DB
}
data, err := s.redis.Get(ctx, key).Bytes()
if err != nil && err != redis.Nil {
if s.failures.Add(1) > 5 {
s.circuitOpen.Store(true)
go s.resetCircuitAfter(30 * time.Second)
}
return nil, err
}
s.failures.Store(0)
return data, err
}
Wrapping Up
Redis caching in payment systems boils down to a few principles: cache reference data aggressively, never cache transaction state, always have a DB fallback, and treat your TTLs as a business decision, not a technical one. The patterns themselves aren't complicated — cache-aside, hash tags for Cluster, stampede locks — but the consequences of getting them wrong are amplified when money is involved.
Our system now handles 8,000 TPS with a p99 of 45ms for the caching layer. Redis serves about 40,000 reads per second with a 99.9% hit rate on merchant configs. The Postgres read replicas that used to run hot are now mostly idle. And I sleep better during flash sales.
References
- Redis Documentation — Patterns and Best Practices
- Redis Documentation — High Availability with Sentinel
- Redis Documentation — Scaling with Redis Cluster
- go-redis — Redis Client for Go
- Go Standard Library — context Package
- AWS ElastiCache for Redis — Managed Redis Service
Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Code examples are simplified for clarity — always review and adapt for your specific use case and security requirements. Never store sensitive cardholder data in Redis without proper PCI DSS compliance.