Redis-Based Rate Limiting for Payment APIs

Why Fixed-Window Counters Will Betray You

The first rate limiter I ever shipped to production used a fixed-window counter. Simple Redis INCR with a TTL — increment a key like ratelimit:merchant_123:2026-04-15T10:05, expire it after 60 seconds, reject if the count exceeds the threshold. It worked great in staging.

Then a merchant ran a batch reconciliation job at 10:59:58. Two hundred requests hit in the last two seconds of one window, and another two hundred landed in the first two seconds of the next. Four hundred requests in four seconds against a limit of 200 per minute. The downstream payment processor throttled us, and we started dropping legitimate transactions.

This is the classic boundary problem with fixed windows. The effective rate can spike to 2x your configured limit right at the window edge. For a payment API, that kind of burst can cascade into processor-level throttling, failed settlements, and very unhappy merchants.

Algorithm	Burst Handling	Memory Cost	Accuracy	Complexity
Fixed Window	Poor — 2x burst at edges	Low — 1 key per window	Low	Trivial
Sliding Window Log	Excellent — true sliding	High — stores every request	Exact	Moderate
Sliding Window Counter	Good — weighted average	Low — 2 keys per window	Approximate	Low
Token Bucket	Good — configurable burst	Low — 2 fields per bucket	Exact	Moderate

Sliding Windows with Redis Sorted Sets

The first upgrade we made was switching to a sliding window log using Redis sorted sets. The idea is straightforward: each request gets added to a sorted set with the current timestamp as the score. Before allowing a request, you remove entries older than your window, count what's left, and decide.

This works well for moderate traffic. But at scale, storing every single request timestamp gets expensive. A merchant doing 500 requests per minute means 500 entries in the sorted set at any given time. Multiply that across a few thousand merchants and multiple endpoints, and your Redis memory starts climbing fast.

We used this approach for our lower-volume admin APIs where precision mattered more than memory. For the high-throughput transaction endpoints, we needed something leaner.

Token Bucket with Lua: The Production Winner

The token bucket algorithm ended up being our workhorse. The concept is simple — imagine a bucket that fills with tokens at a steady rate. Each request consumes a token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity, which controls burst size.

The critical part is making this atomic in a distributed environment. With multiple API gateway instances hitting the same Redis cluster, you can't do a read-then-write without risking race conditions. This is where Lua scripting in Redis saves you — EVALSHA executes your script atomically on the Redis server itself.

Here's the Lua script we run on every inbound request:

-- token_bucket.lua
-- KEYS[1] = bucket key (e.g., "rl:merchant_123:/v1/charges")
-- ARGV[1] = max tokens (bucket capacity)
-- ARGV[2] = refill rate (tokens per second)
-- ARGV[3] = current timestamp (microseconds)
-- ARGV[4] = tokens to consume (usually 1)

local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1])
local last_refill = tonumber(bucket[2])

if tokens == nil then
  -- First request: initialize full bucket
  tokens = capacity
  last_refill = now
end

-- Calculate tokens to add since last refill
local elapsed = (now - last_refill) / 1000000  -- convert to seconds
local new_tokens = elapsed * refill_rate
tokens = math.min(capacity, tokens + new_tokens)

local allowed = 0
local remaining = tokens

if tokens >= requested then
  tokens = tokens - requested
  allowed = 1
  remaining = tokens
end

-- Update bucket state
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)

return {allowed, math.floor(remaining), math.ceil((requested - remaining) / refill_rate)}

And the Go side that calls it:

func (rl *RateLimiter) Allow(ctx context.Context, key string, limit Rate) (*Result, error) {
    now := time.Now().UnixMicro()

    res, err := rl.redis.EvalSha(ctx, rl.scriptSHA, []string{key},
        limit.Capacity,
        limit.RefillRate,
        now,
        1,  // tokens to consume
    ).Int64Slice()

    if err != nil {
        // Fallback: allow the request if Redis is down
        return &Result{Allowed: true, Remaining: -1}, nil
    }

    return &Result{
        Allowed:    res[0] == 1,
        Remaining:  int(res[1]),
        RetryAfter: time.Duration(res[2]) * time.Second,
    }, nil
}

Why EVALSHA over EVAL? We load the Lua script once at startup with SCRIPT LOAD and then reference it by SHA digest. This avoids sending the full script text on every request — a small optimization that adds up when you're doing tens of thousands of evaluations per second.

Per-Merchant and Per-Endpoint Limits

A flat rate limit across all merchants is a non-starter. Your largest enterprise merchant processing millions a month shouldn't share the same ceiling as a startup in sandbox mode. We built a two-tier system:

Global endpoint limits — protect the infrastructure itself. For example, /v1/charges might have a global ceiling of 10,000 requests per second across all merchants.
Per-merchant limits — configurable per plan tier. Free-tier merchants get 100 req/min, enterprise gets 5,000 req/min, with custom overrides for specific merchants stored in a config table.

The Redis key structure encodes both dimensions: rl:{merchant_id}:{endpoint} for per-merchant limits and rl:global:{endpoint} for the global ceiling. Both checks run in a single Redis pipeline to keep latency low.

Request Flow: Rate Limiting Architecture

Merchant Client

API Gateway (rate limit middleware)

Redis Cluster
EVALSHA token bucket

Allowed

429 Rejected

Config Store
Merchant tier limits

Payment Service → Processor

Handling Redis Failures Gracefully

Here's the question that keeps payment engineers up at night: what happens when Redis goes down? You have two options, and neither is comfortable.

Option A: Fail open. If Redis is unreachable, allow all requests through. This is what we chose. The reasoning is straightforward — a payment API that rejects every request because the rate limiter is down is worse than one that temporarily runs without rate limits. Your downstream processors have their own limits anyway, and a few minutes of unthrottled traffic is survivable.

Option B: Fail closed. Reject everything when Redis is down. This protects downstream systems but means a Redis outage becomes a full API outage. For a payment system, this is usually the wrong trade-off.

We added a local in-memory fallback using a per-instance token bucket that kicks in when Redis is unreachable. It's not globally coordinated, but it provides a rough safety net. The key is setting the local limits conservatively — if you have 8 API gateway instances and a global limit of 8,000 req/s, each local fallback gets 800 req/s.

Production tip: Always set Retry-After and X-RateLimit-Remaining headers on 429 responses. Good clients will back off automatically. We saw a 40% reduction in retry storms after adding proper rate limit headers with accurate reset timestamps.

Circuit Breaker on the Redis Call

We wrapped the Redis rate limit check in a circuit breaker. After 5 consecutive failures within 10 seconds, the breaker opens and we fall back to local limiting for 30 seconds before probing again. This prevents a flapping Redis connection from adding latency to every single request while it's struggling.

99.7%

Requests served within rate limits

< 1ms

Redis EVALSHA latency p99

Dropped transactions from rate limiter bugs

Lessons from Production

A few things I wish I'd known before building this:

Use EVALSHA, not EVAL. We saw a measurable latency improvement after switching. The script payload was small, but at 15k+ calls per second, the bandwidth savings were real.
Set key expiration generously. We set TTL to 2x the bucket refill time. If a merchant goes quiet, the key expires naturally. Without expiration, you'll slowly leak memory for inactive merchants.
Monitor your 429 rate by merchant. A sudden spike in 429s for a single merchant usually means they changed their integration — not that your limits are wrong. Having per-merchant dashboards saved us from unnecessary limit bumps.
Test with clock skew. In a distributed setup, different API instances may have slightly different system clocks. We pass timestamps from the application layer rather than using Redis TIME to keep things consistent, but we also added a tolerance window of 100ms to handle minor drift.
Don't forget SCRIPT EXISTS. After a Redis restart or failover, your loaded scripts are gone. We check on startup and after any connection reset, reloading the Lua script if the SHA is missing.

References

Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Pricing and features mentioned are subject to change — always verify with official documentation.

Why Fixed-Window Counters Will Betray You

Sliding Windows with Redis Sorted Sets

Token Bucket with Lua: The Production Winner

Per-Merchant and Per-Endpoint Limits

Handling Redis Failures Gracefully

Circuit Breaker on the Redis Call

Lessons from Production

References

Related Articles