Load Balancing
Load balancing distributes incoming network traffic across multiple servers to ensure no single server bears too much demand, improving reliability and throughput.
A load balancer sits between clients and a pool of backend servers, forwarding each request to a server chosen by an algorithm — round robin, least connections, IP hash, or weighted variants. It improves availability (if one server dies, traffic reroutes), throughput (more servers = more capacity), and latency (requests go to the least-loaded server). Load balancers operate at Layer 4 (TCP/UDP) or Layer 7 (HTTP), with L7 enabling content-based routing. Every major cloud provider offers managed load balancers (AWS ALB/NLB, GCP Cloud Load Balancing, Cloudflare).
Tradeoffs
Strengths
- Enables horizontal scaling with zero client-side changes
- Provides automatic failover and self-healing
- L7 enables sophisticated routing, A/B testing, canary deploys
Weaknesses
- Adds latency (especially L7 with TLS termination)
- The LB itself is a single point of failure if not made redundant
- Stateful routing (sticky sessions) reduces the benefits of distribution
- L7 LBs can become throughput bottlenecks at extreme scale
When to Use
- Any production system with >1 server instance
- Systems requiring zero-downtime deployments
- Multi-region deployments (global LB)
When NOT to Use
- Single-server hobby projects
- Peer-to-peer architectures where clients connect directly
Likely Follow-Up Questions
- How would you handle session stickiness without cookies?
- What happens when the load balancer itself fails?
- When would you choose L4 over L7 load balancing?
- How does a load balancer interact with auto-scaling?
- What's the difference between DNS-based and hardware load balancing?
- How would you load balance WebSocket connections?
Related Concepts
Source: editorial — Synthesized from system-design-primer, real-world architecture patterns, and interview prep materials