Rate Limiting
Rate limiting controls the number of requests a client can make to a service within a time window, protecting against abuse, ensuring fair usage, and preventing resource exhaustion.
Rate limiting restricts how many requests a client (identified by IP, API key, or user ID) can make in a given time window. When the limit is exceeded, the server returns HTTP 429 Too Many Requests. Common algorithms are token bucket (smooth, bursty-friendly), sliding window (accurate counts), and fixed window (simple but edge-burst vulnerable). Rate limiters are typically implemented in the API gateway or a middleware layer using Redis for distributed state.
Tradeoffs
Strengths
- Protection: Prevents abuse, brute-force attacks, and accidental resource exhaustion.
- Fairness: Ensures no single client monopolizes shared resources.
- Cost control: Limits expensive downstream calls (third-party APIs, database queries).
- Simplicity: Token bucket and sliding window algorithms are straightforward to implement.
- Monetization: Different rate limits for different pricing tiers is a proven business model.
Weaknesses
- User experience: Legitimate users hitting rate limits is frustrating, especially if limits are too aggressive.
- Distributed accuracy: Maintaining exact global counts across multiple servers adds latency and complexity.
- Configuration complexity: Choosing the right limits requires traffic analysis and continuous tuning.
- Circumvention: Sophisticated attackers can distribute requests across IPs/accounts to evade per-client limits.
- Clock skew: Time-window-based algorithms can behave inconsistently if server clocks are not synchronized.
- Legitimate burst handling: Strict rate limits can reject valid traffic spikes (e.g., a marketing campaign launch).
Likely Follow-Up Questions
- How would you implement distributed rate limiting across multiple data centers?
- What is the difference between token bucket and leaky bucket?
- How do you rate limit in a microservices architecture where requests fan out?
- How would you handle rate limiting for WebSocket connections?
- What is the fixed-window boundary burst problem and how do sliding windows solve it?
- How do you choose rate limits for a new API?
Source: editorial — Synthesized from Stripe/GitHub/Cloudflare API documentation, Redis rate limiting patterns, and IETF rate limiting RFCs.