Building Rate Limiting Middleware in Ruby for Payment APIs

Last year, one of our merchants ran a flash sale that sent 12,000 charge requests in under 90 seconds. Our generic Rack::Attack config treated every merchant the same, so the burst tripped the global limit and started returning 429s to all merchants on the platform. A merchant processing $8M/month in legitimate volume got blocked for six minutes. That incident cost us real money and a very uncomfortable call with their CTO.

That was the week I ripped out our default Rack::Attack throttles and built a custom rate limiting middleware from scratch. Here is what I learned protecting a gateway that processes roughly $50M/month across 400+ merchants.

Why Generic Rate Limiting Fails for Payment APIs

Rack::Attack is a fantastic gem for most web applications. But its default throttle blocks are IP-based or globally scoped. Payment APIs have a fundamentally different shape:

A single merchant might legitimately send 500 requests/minute during peak hours, while another sends 5.
A POST /charges endpoint has very different risk and cost profiles than GET /balance.
You cannot just block traffic. A false positive on a charge endpoint means a real customer's payment fails at checkout.

We needed three dimensions of granularity: per-merchant, per-endpoint tier, and per-time-window. None of the off-the-shelf configurations gave us that without significant custom work, so we went with a purpose-built Rack middleware backed by Redis.

Tiered Rate Limits by Endpoint

Not all endpoints are created equal. We settled on three tiers based on the operational cost and risk of each action:

Rate Limits by Endpoint Tier (per merchant, per minute)

GET /balance

300/min

GET /transactions

240/min

POST /charges

120/min

POST /refunds

60/min

POST /payouts

30/min

Read-only endpoints get the highest limits. Money-movement endpoints get the tightest controls.

The logic is straightforward: endpoints that move money get stricter limits. A runaway script hitting /refunds 1,000 times is a very different problem than one polling /balance.

Sliding Window Counters with Redis

Fixed-window rate limiting has a well-known edge case: a merchant can send 120 requests at second 59 of window 1 and another 120 at second 0 of window 2, effectively doubling their limit in a two-second span. For payment APIs, that kind of burst can cause real downstream issues with processor rate limits.

We use a sliding window counter pattern. The idea is to weight the previous window's count by how much of it overlaps with the current window:

Sliding Window Algorithm

Previous window: 84 reqs

Current window: 47 reqs

Now (35% into current window)

T-120s

T-60s

T (now)

        weighted_count = (84 × 0.65) + 47 = 101.6  ←  under 120 limit, request allowed
      

Here is the core middleware implementation:

class PaymentRateLimiter
  TIER_LIMITS = {
    '/v1/charges'      => { limit: 120, window: 60 },
    '/v1/refunds'      => { limit: 60,  window: 60 },
    '/v1/payouts'      => { limit: 30,  window: 60 },
    '/v1/balance'      => { limit: 300, window: 60 },
    '/v1/transactions' => { limit: 240, window: 60 }
  }.freeze

  SOFT_LIMIT_RATIO = 0.85

  def initialize(app, redis: Redis.current)
    @app   = app
    @redis = redis
  end

  def call(env)
    merchant_id = env['X_MERCHANT_ID']
    path        = env['PATH_INFO']
    tier        = resolve_tier(path)

    return @app.call(env) unless tier
    return @app.call(env) if allowlisted?(merchant_id)
    return [429, rate_limit_headers(0, tier), []] if denylisted?(merchant_id)

    count = sliding_window_count(merchant_id, tier)
    limit = merchant_limit(merchant_id, tier)

    if count >= limit
      log_rate_limit_hit(merchant_id, path, count, limit)
      return [429, rate_limit_headers(0, tier), [
        { error: 'rate_limit_exceeded',
          retry_after: tier[:window] }.to_json
      ]]
    end

    if count >= (limit * SOFT_LIMIT_RATIO)
      notify_merchant_approaching_limit(merchant_id, path, count, limit)
    end

    increment_counter(merchant_id, tier)
    status, headers, response = @app.call(env)
    [status, headers.merge(rate_limit_headers(limit - count - 1, tier)), response]
  end

  private

  def sliding_window_count(merchant_id, tier)
    now        = Time.now.to_f
    window     = tier[:window]
    current_k  = "rl:#{merchant_id}:#{tier_key(tier)}:#{(now / window).floor}"
    previous_k = "rl:#{merchant_id}:#{tier_key(tier)}:#{(now / window).floor - 1}"

    current_count  = (@redis.get(current_k) || 0).to_i
    previous_count = (@redis.get(previous_k) || 0).to_i

    elapsed_ratio = (now % window) / window
    (previous_count * (1.0 - elapsed_ratio)) + current_count
  end

  def increment_counter(merchant_id, tier)
    now    = Time.now.to_f
    window = tier[:window]
    key    = "rl:#{merchant_id}:#{tier_key(tier)}:#{(now / window).floor}"

    @redis.multi do |tx|
      tx.incr(key)
      tx.expire(key, window * 2)
    end
  end

  def merchant_limit(merchant_id, tier)
    custom = @redis.get("rl:custom_limit:#{merchant_id}:#{tier_key(tier)}")
    custom ? custom.to_i : tier[:limit]
  end

  def rate_limit_headers(remaining, tier)
    {
      'X-RateLimit-Limit'     => tier[:limit].to_s,
      'X-RateLimit-Remaining' => [remaining, 0].max.to_s,
      'X-RateLimit-Reset'     => (Time.now.to_i + tier[:window]).to_s,
      'Retry-After'           => tier[:window].to_s
    }
  end
end

We register it in config.ru before any application routing:

# config.ru
require_relative 'lib/payment_rate_limiter'

use PaymentRateLimiter, redis: Redis.new(url: ENV['REDIS_URL'])
run Rails.application

Handling Flash Sale Bursts

The flash sale incident taught us that a single static limit per tier is not enough. Our top 20 merchants generate 70% of our volume, and their traffic patterns are wildly different from the long tail.

We introduced per-merchant custom limits stored in Redis. When a merchant's sales team tells us about an upcoming flash sale, we temporarily raise their limit via an internal admin tool:

# Temporarily raise limits for merchant during flash sale
redis.set("rl:custom_limit:merchant_abc:charges", 500)
redis.expire("rl:custom_limit:merchant_abc:charges", 7200) # 2 hours

The merchant_limit method checks for a custom override first, falling back to the tier default. The TTL on the key means we never forget to revert it.

The Soft Limit Pattern

Blocking a merchant at exactly 100% of their limit with no warning is a terrible experience. We implemented a "soft limit" at 85% of the threshold. When a merchant crosses the soft limit, two things happen:

We fire a webhook to their configured notification URL with a rate_limit.warning event.
We add a X-RateLimit-Warning: approaching header to every response.

This gives their engineering team time to react, whether that means queuing requests on their side or calling us to request a temporary increase. Since we added soft limits, our hard-block incidents dropped by about 60%.

Common Mistakes to Avoid

Using MULTI/EXEC without EXPIRE — If your app crashes between INCR and EXPIRE, you get a counter that never resets. Always set both in the same transaction.
Rate limiting after authentication — Put your limiter before auth middleware. Otherwise, an attacker can brute-force API keys without ever hitting the rate limit.
Returning 429 without Retry-After — RFC 6585 recommends including this header. Without it, clients retry immediately and make the problem worse.
Forgetting to exclude health checks — Your load balancer's health check endpoint should never count against rate limits. We learned this one the hard way when a misconfigured ALB triggered 429s for itself.

Allowlists, Denylists, and Known Actors

We maintain two Redis sets: rl:allowlist for merchants that bypass rate limiting entirely (our top-tier enterprise accounts with dedicated infrastructure), and rl:denylist for merchants we want to block immediately (compromised API keys, fraud investigations).

def allowlisted?(merchant_id)
  @redis.sismember('rl:allowlist', merchant_id)
end

def denylisted?(merchant_id)
  @redis.sismember('rl:denylist', merchant_id)
end

The allowlist is small, usually under 10 merchants. These are accounts where we have contractual SLAs and dedicated capacity. Everyone else goes through the standard rate limiting path, including us when we test in production.

Monitoring and Tuning

We ship every rate limit event to Datadog with tags for merchant ID, endpoint tier, and whether it was a soft or hard limit. This gives us a few critical dashboards:

Rate limit hit rate by merchant — If a merchant consistently hits 80%+ of their limit, we proactively reach out to discuss higher tiers.
429 response ratio — We alert if this exceeds 2% of total traffic in any 5-minute window. Above that threshold, something is probably wrong with our limits, not the merchant's behavior.
Sliding window accuracy — We periodically compare our Redis counters against actual request logs to make sure the counts are not drifting.

After six months of running this system, we have tuned our default limits three times. The initial numbers were educated guesses. Real traffic data showed that our /transactions limit was too low (merchants with large catalogs need to paginate heavily) and our /refunds limit was too high (no legitimate merchant needs 60 refunds per minute).

Key takeaway: Rate limits are not set-and-forget. Treat your initial thresholds as hypotheses and let production traffic data guide your tuning. We review our limits quarterly against the 95th and 99th percentile usage patterns.

What I Would Do Differently

If I were building this again from scratch, I would start with the sliding window approach from day one instead of migrating from fixed windows mid-flight. I would also invest earlier in a self-service portal where merchants can see their current usage against their limits in real time. We built that six months later, and it cut our support tickets about rate limiting by half.

The other thing I underestimated was the importance of the Retry-After header. Once we started returning accurate retry windows, the thundering herd problem after a rate limit event mostly solved itself. Clients that respect the header back off naturally, and the ones that do not are usually the ones you want to block anyway.

References

Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Pricing and features mentioned are subject to change — always verify with official documentation.

Why Generic Rate Limiting Fails for Payment APIs

Tiered Rate Limits by Endpoint

Sliding Window Counters with Redis

Handling Flash Sale Bursts

The Soft Limit Pattern

Allowlists, Denylists, and Known Actors

Monitoring and Tuning

What I Would Do Differently

References

Related Articles