APIthrottlingtoken-bucketsliding-windowapi-protectionDDoS429redis

Rate Limiting

Rate limiting controls the number of requests a client can make to a service within a time window, protecting against abuse, ensuring fair usage, and preventing resource exhaustion.

Rate limiting restricts how many requests a client (identified by IP, API key, or user ID) can make in a given time window. When the limit is exceeded, the server returns HTTP 429 Too Many Requests. Common algorithms are token bucket (smooth, bursty-friendly), sliding window (accurate counts), and fixed window (simple but edge-burst vulnerable). Rate limiters are typically implemented in the API gateway or a middleware layer using Redis for distributed state.

Tradeoffs

Strengths

Protection: Prevents abuse, brute-force attacks, and accidental resource exhaustion.
Fairness: Ensures no single client monopolizes shared resources.
Cost control: Limits expensive downstream calls (third-party APIs, database queries).
Simplicity: Token bucket and sliding window algorithms are straightforward to implement.
Monetization: Different rate limits for different pricing tiers is a proven business model.

Weaknesses

User experience: Legitimate users hitting rate limits is frustrating, especially if limits are too aggressive.
Distributed accuracy: Maintaining exact global counts across multiple servers adds latency and complexity.
Configuration complexity: Choosing the right limits requires traffic analysis and continuous tuning.
Circumvention: Sophisticated attackers can distribute requests across IPs/accounts to evade per-client limits.
Clock skew: Time-window-based algorithms can behave inconsistently if server clocks are not synchronized.
Legitimate burst handling: Strict rate limits can reject valid traffic spikes (e.g., a marketing campaign launch).

Likely Follow-Up Questions

How would you implement distributed rate limiting across multiple data centers?
What is the difference between token bucket and leaky bucket?
How do you rate limit in a microservices architecture where requests fan out?
How would you handle rate limiting for WebSocket connections?
What is the fixed-window boundary burst problem and how do sliding windows solve it?
How do you choose rate limits for a new API?

Related Concepts

API Gateway Load Balancing Reverse Proxy Caching Microservices Architecture

Source: editorial — Synthesized from Stripe/GitHub/Cloudflare API documentation, Redis rate limiting patterns, and IETF rate limiting RFCs.