Last year, one of our merchants ran a flash sale that sent 12,000 charge requests in under 90 seconds. Our generic Rack::Attack config treated every merchant the same, so the burst tripped the global limit and started returning 429s to all merchants on the platform. A merchant processing $8M/month in legitimate volume got blocked for six minutes. That incident cost us real money and a very uncomfortable call with their CTO.
That was the week I ripped out our default Rack::Attack throttles and built a custom rate limiting middleware from scratch. Here is what I learned protecting a gateway that processes roughly $50M/month across 400+ merchants.
Why Generic Rate Limiting Fails for Payment APIs
Rack::Attack is a fantastic gem for most web applications. But its default throttle blocks are IP-based or globally scoped. Payment APIs have a fundamentally different shape:
- A single merchant might legitimately send 500 requests/minute during peak hours, while another sends 5.
- A
POST /chargesendpoint has very different risk and cost profiles thanGET /balance. - You cannot just block traffic. A false positive on a charge endpoint means a real customer's payment fails at checkout.
We needed three dimensions of granularity: per-merchant, per-endpoint tier, and per-time-window. None of the off-the-shelf configurations gave us that without significant custom work, so we went with a purpose-built Rack middleware backed by Redis.
Tiered Rate Limits by Endpoint
Not all endpoints are created equal. We settled on three tiers based on the operational cost and risk of each action:
The logic is straightforward: endpoints that move money get stricter limits. A runaway script hitting /refunds 1,000 times is a very different problem than one polling /balance.
Sliding Window Counters with Redis
Fixed-window rate limiting has a well-known edge case: a merchant can send 120 requests at second 59 of window 1 and another 120 at second 0 of window 2, effectively doubling their limit in a two-second span. For payment APIs, that kind of burst can cause real downstream issues with processor rate limits.
We use a sliding window counter pattern. The idea is to weight the previous window's count by how much of it overlaps with the current window:
Here is the core middleware implementation:
class PaymentRateLimiter
TIER_LIMITS = {
'/v1/charges' => { limit: 120, window: 60 },
'/v1/refunds' => { limit: 60, window: 60 },
'/v1/payouts' => { limit: 30, window: 60 },
'/v1/balance' => { limit: 300, window: 60 },
'/v1/transactions' => { limit: 240, window: 60 }
}.freeze
SOFT_LIMIT_RATIO = 0.85
def initialize(app, redis: Redis.current)
@app = app
@redis = redis
end
def call(env)
merchant_id = env['X_MERCHANT_ID']
path = env['PATH_INFO']
tier = resolve_tier(path)
return @app.call(env) unless tier
return @app.call(env) if allowlisted?(merchant_id)
return [429, rate_limit_headers(0, tier), []] if denylisted?(merchant_id)
count = sliding_window_count(merchant_id, tier)
limit = merchant_limit(merchant_id, tier)
if count >= limit
log_rate_limit_hit(merchant_id, path, count, limit)
return [429, rate_limit_headers(0, tier), [
{ error: 'rate_limit_exceeded',
retry_after: tier[:window] }.to_json
]]
end
if count >= (limit * SOFT_LIMIT_RATIO)
notify_merchant_approaching_limit(merchant_id, path, count, limit)
end
increment_counter(merchant_id, tier)
status, headers, response = @app.call(env)
[status, headers.merge(rate_limit_headers(limit - count - 1, tier)), response]
end
private
def sliding_window_count(merchant_id, tier)
now = Time.now.to_f
window = tier[:window]
current_k = "rl:#{merchant_id}:#{tier_key(tier)}:#{(now / window).floor}"
previous_k = "rl:#{merchant_id}:#{tier_key(tier)}:#{(now / window).floor - 1}"
current_count = (@redis.get(current_k) || 0).to_i
previous_count = (@redis.get(previous_k) || 0).to_i
elapsed_ratio = (now % window) / window
(previous_count * (1.0 - elapsed_ratio)) + current_count
end
def increment_counter(merchant_id, tier)
now = Time.now.to_f
window = tier[:window]
key = "rl:#{merchant_id}:#{tier_key(tier)}:#{(now / window).floor}"
@redis.multi do |tx|
tx.incr(key)
tx.expire(key, window * 2)
end
end
def merchant_limit(merchant_id, tier)
custom = @redis.get("rl:custom_limit:#{merchant_id}:#{tier_key(tier)}")
custom ? custom.to_i : tier[:limit]
end
def rate_limit_headers(remaining, tier)
{
'X-RateLimit-Limit' => tier[:limit].to_s,
'X-RateLimit-Remaining' => [remaining, 0].max.to_s,
'X-RateLimit-Reset' => (Time.now.to_i + tier[:window]).to_s,
'Retry-After' => tier[:window].to_s
}
end
end
We register it in config.ru before any application routing:
# config.ru
require_relative 'lib/payment_rate_limiter'
use PaymentRateLimiter, redis: Redis.new(url: ENV['REDIS_URL'])
run Rails.application
Handling Flash Sale Bursts
The flash sale incident taught us that a single static limit per tier is not enough. Our top 20 merchants generate 70% of our volume, and their traffic patterns are wildly different from the long tail.
We introduced per-merchant custom limits stored in Redis. When a merchant's sales team tells us about an upcoming flash sale, we temporarily raise their limit via an internal admin tool:
# Temporarily raise limits for merchant during flash sale
redis.set("rl:custom_limit:merchant_abc:charges", 500)
redis.expire("rl:custom_limit:merchant_abc:charges", 7200) # 2 hours
The merchant_limit method checks for a custom override first, falling back to the tier default. The TTL on the key means we never forget to revert it.
The Soft Limit Pattern
Blocking a merchant at exactly 100% of their limit with no warning is a terrible experience. We implemented a "soft limit" at 85% of the threshold. When a merchant crosses the soft limit, two things happen:
- We fire a webhook to their configured notification URL with a
rate_limit.warningevent. - We add a
X-RateLimit-Warning: approachingheader to every response.
This gives their engineering team time to react, whether that means queuing requests on their side or calling us to request a temporary increase. Since we added soft limits, our hard-block incidents dropped by about 60%.
- Using MULTI/EXEC without EXPIRE — If your app crashes between INCR and EXPIRE, you get a counter that never resets. Always set both in the same transaction.
- Rate limiting after authentication — Put your limiter before auth middleware. Otherwise, an attacker can brute-force API keys without ever hitting the rate limit.
- Returning 429 without Retry-After — RFC 6585 recommends including this header. Without it, clients retry immediately and make the problem worse.
- Forgetting to exclude health checks — Your load balancer's health check endpoint should never count against rate limits. We learned this one the hard way when a misconfigured ALB triggered 429s for itself.
Allowlists, Denylists, and Known Actors
We maintain two Redis sets: rl:allowlist for merchants that bypass rate limiting entirely (our top-tier enterprise accounts with dedicated infrastructure), and rl:denylist for merchants we want to block immediately (compromised API keys, fraud investigations).
def allowlisted?(merchant_id)
@redis.sismember('rl:allowlist', merchant_id)
end
def denylisted?(merchant_id)
@redis.sismember('rl:denylist', merchant_id)
end
The allowlist is small, usually under 10 merchants. These are accounts where we have contractual SLAs and dedicated capacity. Everyone else goes through the standard rate limiting path, including us when we test in production.
Monitoring and Tuning
We ship every rate limit event to Datadog with tags for merchant ID, endpoint tier, and whether it was a soft or hard limit. This gives us a few critical dashboards:
- Rate limit hit rate by merchant — If a merchant consistently hits 80%+ of their limit, we proactively reach out to discuss higher tiers.
- 429 response ratio — We alert if this exceeds 2% of total traffic in any 5-minute window. Above that threshold, something is probably wrong with our limits, not the merchant's behavior.
- Sliding window accuracy — We periodically compare our Redis counters against actual request logs to make sure the counts are not drifting.
After six months of running this system, we have tuned our default limits three times. The initial numbers were educated guesses. Real traffic data showed that our /transactions limit was too low (merchants with large catalogs need to paginate heavily) and our /refunds limit was too high (no legitimate merchant needs 60 refunds per minute).
Key takeaway: Rate limits are not set-and-forget. Treat your initial thresholds as hypotheses and let production traffic data guide your tuning. We review our limits quarterly against the 95th and 99th percentile usage patterns.
What I Would Do Differently
If I were building this again from scratch, I would start with the sliding window approach from day one instead of migrating from fixed windows mid-flight. I would also invest earlier in a self-service portal where merchants can see their current usage against their limits in real time. We built that six months later, and it cut our support tickets about rate limiting by half.
The other thing I underestimated was the importance of the Retry-After header. Once we started returning accurate retry windows, the thundering herd problem after a rate limit event mostly solved itself. Clients that respect the header back off naturally, and the ones that do not are usually the ones you want to block anyway.
References
- Rack::Attack — Rack middleware for blocking and throttling
- Redis INCR command documentation
- IETF RFC 6585 — Additional HTTP Status Codes (429 Too Many Requests)
- Stripe API Rate Limiting — industry reference implementation
Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Pricing and features mentioned are subject to change — always verify with official documentation.