Why Rate Limiting Matters More Than You Think
Every API starts without rate limiting. It feels unnecessary when you have ten users. Then one night, a client with a broken retry loop starts hammering your endpoint, your database connection pool maxes out, and suddenly every user gets 503s. I've been there. The fix took a week. The outage took four hours.
Rate limiting isn't just about protecting against abuse. It's about fairness. Without it, one noisy client can starve everyone else. It's about predictability — your infrastructure team needs to capacity-plan against known limits, not hope for the best. And it's about cost — every unthrottled request burns compute, and cloud bills don't care about your intentions.
Where Rate Limiting Sits in Your Stack
Before writing any code, you need to understand where the limiter lives in the request lifecycle. Get this wrong and you'll either limit too early (blocking legitimate traffic at the load balancer) or too late (your app already did expensive work before rejecting the request).
Balancer
Limiter
Handler
The sweet spot is right after authentication but before any business logic. You need to know who is making the request (to apply per-user limits), but you don't want to touch the database or do heavy computation for a request you're about to reject.
Token Bucket vs Sliding Window vs Fixed Window
There are three algorithms you'll actually use in production. Each has trade-offs that matter depending on your traffic patterns.
For most APIs, start with token bucket. It handles bursty traffic gracefully — a user who sends 5 requests at once and then goes quiet for a few seconds shouldn't be penalized the same way as someone sending a steady stream at the limit. The sliding window is better when you need strict guarantees, like billing APIs where "100 requests per minute" must mean exactly that.
A Simple In-Process Limiter in Go
Let's start with the simplest thing that works: an in-process token bucket using sync.Mutex. This is good enough for a single-instance service.
type RateLimiter struct {
mu sync.Mutex
tokens float64
maxTokens float64
refillRate float64 // tokens per second
lastRefill time.Time
}
func NewRateLimiter(maxTokens, refillRate float64) *RateLimiter {
return &RateLimiter{
tokens: maxTokens,
maxTokens: maxTokens,
refillRate: refillRate,
lastRefill: time.Now(),
}
}
func (rl *RateLimiter) Allow() bool {
rl.mu.Lock()
defer rl.mu.Unlock()
now := time.Now()
elapsed := now.Sub(rl.lastRefill).Seconds()
rl.tokens += elapsed * rl.refillRate
if rl.tokens > rl.maxTokens {
rl.tokens = rl.maxTokens
}
rl.lastRefill = now
if rl.tokens >= 1 {
rl.tokens--
return true
}
return false
}
This works, but it's per-process. The moment you scale to two instances behind a load balancer, each instance tracks its own counters. A client hitting both instances gets double the limit. That's where Redis comes in.
Distributed Rate Limiting with Redis
For multi-instance deployments, you need a shared counter. Redis is the standard choice — it's fast, atomic, and you probably already have it in your stack.
The naive approach is INCR + EXPIRE, but there's a race condition: if your process crashes between the two commands, you get a counter without a TTL that never resets. The fix is a Lua script that runs atomically on the Redis server.
Key tip — always use Lua scripts for Redis rate limiting. A Lua script executes atomically on the Redis server, eliminating race conditions between INCR and EXPIRE. Without atomicity, you'll get phantom counters that never expire and clients that bypass your limits entirely. This is the single most common mistake I see in Redis-based rate limiters.
-- sliding_window.lua
local key = KEYS[1]
local window = tonumber(ARGV[1]) -- window size in seconds
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count current requests
local count = redis.call('ZCARD', key)
if count < limit then
redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
redis.call('EXPIRE', key, window)
return 1 -- allowed
end
return 0 -- rejected
In Go, you load this script once at startup and call it with redis.EvalSha for each request. The sorted set gives you a true sliding window — each entry is timestamped, and expired entries get pruned on every check.
Per-User, Per-IP, Per-API-Key — You Need All Three
A single rate limit strategy isn't enough. In production, I run three layers simultaneously:
- Per-IP — catches unauthenticated abuse, brute-force login attempts, and scanner bots. Generous limits (e.g., 100 req/min) since multiple users can share an IP behind a NAT.
- Per-API-key — the primary limit for authenticated traffic. Different tiers for different plans (free: 60/min, pro: 600/min, enterprise: custom).
- Per-user — prevents a single user from burning through a shared organization's API key quota. Useful when multiple team members share credentials.
The key insight: check them in order of cheapest to most expensive. IP lookup is a string comparison. API key lookup might hit a cache. User lookup might hit the database. Reject early when you can.
HTTP Response Patterns
When you reject a request, tell the client exactly what happened and when to retry. This isn't just politeness — well-behaved clients will back off automatically if you give them the right headers.
func rateLimitResponse(w http.ResponseWriter, retryAfter int, limit, remaining int) {
w.Header().Set("X-RateLimit-Limit", strconv.Itoa(limit))
w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(remaining))
w.Header().Set("Retry-After", strconv.Itoa(retryAfter))
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusTooManyRequests) // 429
json.NewEncoder(w).Encode(map[string]string{
"error": "rate_limit_exceeded",
"message": "Too many requests. Please retry after " +
strconv.Itoa(retryAfter) + " seconds.",
})
}
Always include Retry-After. It's defined in RFC 6585 and most HTTP client libraries respect it. The X-RateLimit-* headers aren't standardized but are a de facto convention — Stripe, GitHub, and Twitter all use them.
The Middleware Pattern
In Go's net/http, rate limiting fits naturally as middleware. Wrap your handler, check the limit, and either pass through or return 429.
func RateLimitMiddleware(limiter *DistributedLimiter) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
key := extractAPIKey(r)
result, err := limiter.Check(r.Context(), key)
if err != nil {
// Redis down — fail open (see next section)
next.ServeHTTP(w, r)
return
}
if !result.Allowed {
rateLimitResponse(w, result.RetryAfter, result.Limit, 0)
return
}
w.Header().Set("X-RateLimit-Remaining",
strconv.Itoa(result.Remaining))
next.ServeHTTP(w, r)
})
}
}
Chain it with your other middleware: logging → auth → rateLimit → handler. Auth goes before rate limiting because you need the API key to look up the client's tier.
Graceful Degradation — When Redis Goes Down
Your rate limiter depends on Redis. Redis will go down eventually. What happens then?
You have two choices: fail open (allow all requests) or fail closed (reject all requests). For most APIs, fail open is the right call. A few minutes without rate limiting is better than a total outage. But you need guardrails:
- Fall back to the in-process limiter with conservative limits
- Set a circuit breaker on Redis calls — after 5 consecutive failures, stop trying for 30 seconds
- Log aggressively so you know it happened
- Alert on the fallback state — it should be temporary, not a new normal
Monitoring What Matters
A rate limiter you can't observe is a rate limiter you can't trust. Track these metrics from day one:
Beyond the numbers, build a dashboard that shows your top 10 clients by request volume, updated every minute. When someone starts abusing your API, you'll see them climb the leaderboard before they hit the limit. I've caught misconfigured integrations this way — a partner's staging environment pointed at our production API, sending 10x their normal traffic. We reached out before they even noticed.
Putting It All Together
Start simple. An in-process token bucket with sync.Mutex will handle a surprising amount of traffic on a single instance. When you scale horizontally, add Redis with a Lua script for atomicity. Layer your limits — IP, API key, user — and always return proper 429 responses with Retry-After headers.
The rate limiter I built after that 3am incident has been running for two years now. It handles around 50,000 requests per second across six instances, adds less than 2ms of latency at p99, and has survived three Redis failovers without dropping legitimate traffic. The total implementation is about 400 lines of Go. Not everything needs to be complicated.
References
- Go Standard Library — net/http Package Documentation
- Redis Documentation — Rate Limiting Pattern
- Stripe Engineering — Scaling Your API with Rate Limiters
- Google Cloud Architecture — Rate Limiting Strategies and Techniques
- Cloudflare — Rate Limiting Rules Documentation
Disclaimer: This article reflects the author's personal experience and opinions. Product names, logos, and brands are property of their respective owners. Code examples are simplified for clarity — always review and adapt for your specific use case and security requirements.