Building Production-Grade API Rate Limiting in Go

Why Rate Limiting Matters More Than You Think

Every API starts without rate limiting. It feels unnecessary when you have ten users. Then one night, a client with a broken retry loop starts hammering your endpoint, your database connection pool maxes out, and suddenly every user gets 503s. I've been there. The fix took a week. The outage took four hours.

Rate limiting isn't just about protecting against abuse. It's about fairness. Without it, one noisy client can starve everyone else. It's about predictability — your infrastructure team needs to capacity-plan against known limits, not hope for the best. And it's about cost — every unthrottled request burns compute, and cloud bills don't care about your intentions.

Where Rate Limiting Sits in Your Stack

Before writing any code, you need to understand where the limiter lives in the request lifecycle. Get this wrong and you'll either limit too early (blocking legitimate traffic at the load balancer) or too late (your app already did expensive work before rejecting the request).

Client

→

Load
Balancer

→

Rate
Limiter

→

API
Handler

→

Database

The sweet spot is right after authentication but before any business logic. You need to know who is making the request (to apply per-user limits), but you don't want to touch the database or do heavy computation for a request you're about to reject.

Token Bucket vs Sliding Window vs Fixed Window

There are three algorithms you'll actually use in production. Each has trade-offs that matter depending on your traffic patterns.

Algorithm	How It Works	Pros	Cons	Best For
Token Bucket	Tokens refill at a fixed rate; each request consumes one	Allows short bursts, smooth average rate	Burst size needs tuning	General API limiting
Sliding Window	Counts requests in a rolling time window	No boundary burst problem, precise	Higher memory usage, more complex	Strict compliance limits
Fixed Window	Counts requests in discrete time blocks (e.g., per minute)	Simple to implement, low memory	Boundary burst: 2x limit at window edges	Internal services, rough throttling

For most APIs, start with token bucket. It handles bursty traffic gracefully — a user who sends 5 requests at once and then goes quiet for a few seconds shouldn't be penalized the same way as someone sending a steady stream at the limit. The sliding window is better when you need strict guarantees, like billing APIs where "100 requests per minute" must mean exactly that.

A Simple In-Process Limiter in Go

Let's start with the simplest thing that works: an in-process token bucket using sync.Mutex. This is good enough for a single-instance service.

type RateLimiter struct {
    mu       sync.Mutex
    tokens   float64
    maxTokens float64
    refillRate float64 // tokens per second
    lastRefill time.Time
}

func NewRateLimiter(maxTokens, refillRate float64) *RateLimiter {
    return &RateLimiter{
        tokens:     maxTokens,
        maxTokens:  maxTokens,
        refillRate: refillRate,
        lastRefill: time.Now(),
    }
}

func (rl *RateLimiter) Allow() bool {
    rl.mu.Lock()
    defer rl.mu.Unlock()

    now := time.Now()
    elapsed := now.Sub(rl.lastRefill).Seconds()
    rl.tokens += elapsed * rl.refillRate
    if rl.tokens > rl.maxTokens {
        rl.tokens = rl.maxTokens
    }
    rl.lastRefill = now

    if rl.tokens >= 1 {
        rl.tokens--
        return true
    }
    return false
}

This works, but it's per-process. The moment you scale to two instances behind a load balancer, each instance tracks its own counters. A client hitting both instances gets double the limit. That's where Redis comes in.

Distributed Rate Limiting with Redis

For multi-instance deployments, you need a shared counter. Redis is the standard choice — it's fast, atomic, and you probably already have it in your stack.

The naive approach is INCR + EXPIRE, but there's a race condition: if your process crashes between the two commands, you get a counter without a TTL that never resets. The fix is a Lua script that runs atomically on the Redis server.

Key tip — always use Lua scripts for Redis rate limiting. A Lua script executes atomically on the Redis server, eliminating race conditions between INCR and EXPIRE. Without atomicity, you'll get phantom counters that never expire and clients that bypass your limits entirely. This is the single most common mistake I see in Redis-based rate limiters.

-- sliding_window.lua
local key = KEYS[1]
local window = tonumber(ARGV[1])  -- window size in seconds
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

-- Count current requests
local count = redis.call('ZCARD', key)

if count < limit then
    redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
    redis.call('EXPIRE', key, window)
    return 1  -- allowed
end

return 0  -- rejected

In Go, you load this script once at startup and call it with redis.EvalSha for each request. The sorted set gives you a true sliding window — each entry is timestamped, and expired entries get pruned on every check.

Per-User, Per-IP, Per-API-Key — You Need All Three

A single rate limit strategy isn't enough. In production, I run three layers simultaneously:

Per-IP — catches unauthenticated abuse, brute-force login attempts, and scanner bots. Generous limits (e.g., 100 req/min) since multiple users can share an IP behind a NAT.
Per-API-key — the primary limit for authenticated traffic. Different tiers for different plans (free: 60/min, pro: 600/min, enterprise: custom).
Per-user — prevents a single user from burning through a shared organization's API key quota. Useful when multiple team members share credentials.

The key insight: check them in order of cheapest to most expensive. IP lookup is a string comparison. API key lookup might hit a cache. User lookup might hit the database. Reject early when you can.

HTTP Response Patterns

When you reject a request, tell the client exactly what happened and when to retry. This isn't just politeness — well-behaved clients will back off automatically if you give them the right headers.

func rateLimitResponse(w http.ResponseWriter, retryAfter int, limit, remaining int) {
    w.Header().Set("X-RateLimit-Limit", strconv.Itoa(limit))
    w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(remaining))
    w.Header().Set("Retry-After", strconv.Itoa(retryAfter))
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusTooManyRequests) // 429
    json.NewEncoder(w).Encode(map[string]string{
        "error":   "rate_limit_exceeded",
        "message": "Too many requests. Please retry after " +
                   strconv.Itoa(retryAfter) + " seconds.",
    })
}

Always include Retry-After. It's defined in RFC 6585 and most HTTP client libraries respect it. The X-RateLimit-* headers aren't standardized but are a de facto convention — Stripe, GitHub, and Twitter all use them.

The Middleware Pattern

In Go's net/http, rate limiting fits naturally as middleware. Wrap your handler, check the limit, and either pass through or return 429.

func RateLimitMiddleware(limiter *DistributedLimiter) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            key := extractAPIKey(r)
            result, err := limiter.Check(r.Context(), key)

            if err != nil {
                // Redis down — fail open (see next section)
                next.ServeHTTP(w, r)
                return
            }

            if !result.Allowed {
                rateLimitResponse(w, result.RetryAfter, result.Limit, 0)
                return
            }

            w.Header().Set("X-RateLimit-Remaining",
                strconv.Itoa(result.Remaining))
            next.ServeHTTP(w, r)
        })
    }
}

Chain it with your other middleware: logging → auth → rateLimit → handler. Auth goes before rate limiting because you need the API key to look up the client's tier.

Graceful Degradation — When Redis Goes Down

Your rate limiter depends on Redis. Redis will go down eventually. What happens then?

You have two choices: fail open (allow all requests) or fail closed (reject all requests). For most APIs, fail open is the right call. A few minutes without rate limiting is better than a total outage. But you need guardrails:

Fall back to the in-process limiter with conservative limits
Set a circuit breaker on Redis calls — after 5 consecutive failures, stop trying for 30 seconds
Log aggressively so you know it happened
Alert on the fallback state — it should be temporary, not a new normal

Monitoring What Matters

A rate limiter you can't observe is a rate limiter you can't trust. Track these metrics from day one:

< 0.1%

429 response ratio

Target for healthy APIs

< 2ms

p99 limiter overhead

Redis round-trip budget

1 call

Redis ops per request

Lua script keeps it atomic

Beyond the numbers, build a dashboard that shows your top 10 clients by request volume, updated every minute. When someone starts abusing your API, you'll see them climb the leaderboard before they hit the limit. I've caught misconfigured integrations this way — a partner's staging environment pointed at our production API, sending 10x their normal traffic. We reached out before they even noticed.

Putting It All Together

Start simple. An in-process token bucket with sync.Mutex will handle a surprising amount of traffic on a single instance. When you scale horizontally, add Redis with a Lua script for atomicity. Layer your limits — IP, API key, user — and always return proper 429 responses with Retry-After headers.

The rate limiter I built after that 3am incident has been running for two years now. It handles around 50,000 requests per second across six instances, adds less than 2ms of latency at p99, and has survived three Redis failovers without dropping legitimate traffic. The total implementation is about 400 lines of Go. Not everything needs to be complicated.

References

Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Code examples are simplified for clarity — always review and adapt for your specific use case and security requirements.

Why Rate Limiting Matters More Than You Think

Where Rate Limiting Sits in Your Stack

Token Bucket vs Sliding Window vs Fixed Window

A Simple In-Process Limiter in Go

Distributed Rate Limiting with Redis

Per-User, Per-IP, Per-API-Key — You Need All Three

HTTP Response Patterns

The Middleware Pattern

Graceful Degradation — When Redis Goes Down

Monitoring What Matters

Putting It All Together

References

Related Articles