Why Fixed-Window Counters Will Betray You
The first rate limiter I ever shipped to production used a fixed-window counter. Simple Redis INCR with a TTL — increment a key like ratelimit:merchant_123:2026-04-15T10:05, expire it after 60 seconds, reject if the count exceeds the threshold. It worked great in staging.
Then a merchant ran a batch reconciliation job at 10:59:58. Two hundred requests hit in the last two seconds of one window, and another two hundred landed in the first two seconds of the next. Four hundred requests in four seconds against a limit of 200 per minute. The downstream payment processor throttled us, and we started dropping legitimate transactions.
This is the classic boundary problem with fixed windows. The effective rate can spike to 2x your configured limit right at the window edge. For a payment API, that kind of burst can cascade into processor-level throttling, failed settlements, and very unhappy merchants.
Sliding Windows with Redis Sorted Sets
The first upgrade we made was switching to a sliding window log using Redis sorted sets. The idea is straightforward: each request gets added to a sorted set with the current timestamp as the score. Before allowing a request, you remove entries older than your window, count what's left, and decide.
This works well for moderate traffic. But at scale, storing every single request timestamp gets expensive. A merchant doing 500 requests per minute means 500 entries in the sorted set at any given time. Multiply that across a few thousand merchants and multiple endpoints, and your Redis memory starts climbing fast.
We used this approach for our lower-volume admin APIs where precision mattered more than memory. For the high-throughput transaction endpoints, we needed something leaner.
Token Bucket with Lua: The Production Winner
The token bucket algorithm ended up being our workhorse. The concept is simple — imagine a bucket that fills with tokens at a steady rate. Each request consumes a token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity, which controls burst size.
The critical part is making this atomic in a distributed environment. With multiple API gateway instances hitting the same Redis cluster, you can't do a read-then-write without risking race conditions. This is where Lua scripting in Redis saves you — EVALSHA executes your script atomically on the Redis server itself.
Here's the Lua script we run on every inbound request:
-- token_bucket.lua
-- KEYS[1] = bucket key (e.g., "rl:merchant_123:/v1/charges")
-- ARGV[1] = max tokens (bucket capacity)
-- ARGV[2] = refill rate (tokens per second)
-- ARGV[3] = current timestamp (microseconds)
-- ARGV[4] = tokens to consume (usually 1)
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1])
local last_refill = tonumber(bucket[2])
if tokens == nil then
-- First request: initialize full bucket
tokens = capacity
last_refill = now
end
-- Calculate tokens to add since last refill
local elapsed = (now - last_refill) / 1000000 -- convert to seconds
local new_tokens = elapsed * refill_rate
tokens = math.min(capacity, tokens + new_tokens)
local allowed = 0
local remaining = tokens
if tokens >= requested then
tokens = tokens - requested
allowed = 1
remaining = tokens
end
-- Update bucket state
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)
return {allowed, math.floor(remaining), math.ceil((requested - remaining) / refill_rate)}
And the Go side that calls it:
func (rl *RateLimiter) Allow(ctx context.Context, key string, limit Rate) (*Result, error) {
now := time.Now().UnixMicro()
res, err := rl.redis.EvalSha(ctx, rl.scriptSHA, []string{key},
limit.Capacity,
limit.RefillRate,
now,
1, // tokens to consume
).Int64Slice()
if err != nil {
// Fallback: allow the request if Redis is down
return &Result{Allowed: true, Remaining: -1}, nil
}
return &Result{
Allowed: res[0] == 1,
Remaining: int(res[1]),
RetryAfter: time.Duration(res[2]) * time.Second,
}, nil
}
Why EVALSHA over EVAL? We load the Lua script once at startup with SCRIPT LOAD and then reference it by SHA digest. This avoids sending the full script text on every request — a small optimization that adds up when you're doing tens of thousands of evaluations per second.
Per-Merchant and Per-Endpoint Limits
A flat rate limit across all merchants is a non-starter. Your largest enterprise merchant processing millions a month shouldn't share the same ceiling as a startup in sandbox mode. We built a two-tier system:
- Global endpoint limits — protect the infrastructure itself. For example,
/v1/chargesmight have a global ceiling of 10,000 requests per second across all merchants. - Per-merchant limits — configurable per plan tier. Free-tier merchants get 100 req/min, enterprise gets 5,000 req/min, with custom overrides for specific merchants stored in a config table.
The Redis key structure encodes both dimensions: rl:{merchant_id}:{endpoint} for per-merchant limits and rl:global:{endpoint} for the global ceiling. Both checks run in a single Redis pipeline to keep latency low.
Request Flow: Rate Limiting Architecture
EVALSHA token bucket
Merchant tier limits
Handling Redis Failures Gracefully
Here's the question that keeps payment engineers up at night: what happens when Redis goes down? You have two options, and neither is comfortable.
Option A: Fail open. If Redis is unreachable, allow all requests through. This is what we chose. The reasoning is straightforward — a payment API that rejects every request because the rate limiter is down is worse than one that temporarily runs without rate limits. Your downstream processors have their own limits anyway, and a few minutes of unthrottled traffic is survivable.
Option B: Fail closed. Reject everything when Redis is down. This protects downstream systems but means a Redis outage becomes a full API outage. For a payment system, this is usually the wrong trade-off.
We added a local in-memory fallback using a per-instance token bucket that kicks in when Redis is unreachable. It's not globally coordinated, but it provides a rough safety net. The key is setting the local limits conservatively — if you have 8 API gateway instances and a global limit of 8,000 req/s, each local fallback gets 800 req/s.
Production tip: Always set Retry-After and X-RateLimit-Remaining headers on 429 responses. Good clients will back off automatically. We saw a 40% reduction in retry storms after adding proper rate limit headers with accurate reset timestamps.
Circuit Breaker on the Redis Call
We wrapped the Redis rate limit check in a circuit breaker. After 5 consecutive failures within 10 seconds, the breaker opens and we fall back to local limiting for 30 seconds before probing again. This prevents a flapping Redis connection from adding latency to every single request while it's struggling.
Lessons from Production
A few things I wish I'd known before building this:
- Use
EVALSHA, notEVAL. We saw a measurable latency improvement after switching. The script payload was small, but at 15k+ calls per second, the bandwidth savings were real. - Set key expiration generously. We set TTL to 2x the bucket refill time. If a merchant goes quiet, the key expires naturally. Without expiration, you'll slowly leak memory for inactive merchants.
- Monitor your 429 rate by merchant. A sudden spike in 429s for a single merchant usually means they changed their integration — not that your limits are wrong. Having per-merchant dashboards saved us from unnecessary limit bumps.
- Test with clock skew. In a distributed setup, different API instances may have slightly different system clocks. We pass timestamps from the application layer rather than using Redis
TIMEto keep things consistent, but we also added a tolerance window of 100ms to handle minor drift. - Don't forget
SCRIPT EXISTS. After a Redis restart or failover, your loaded scripts are gone. We check on startup and after any connection reset, reloading the Lua script if the SHA is missing.
References
- Redis — Rate Limiter Pattern
- Redis — EVALSHA Command Documentation
- IETF RFC 6585 — Additional HTTP Status Codes (429 Too Many Requests)
- Stripe — Rate Limiting Best Practices
- Cloudflare — How We Built Rate Limiting Capable of Scaling to Millions of Domains
Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Pricing and features mentioned are subject to change — always verify with official documentation.