Database Sharding
Database sharding horizontally partitions data across multiple independent database instances, each holding a subset of the total dataset, to scale beyond the capacity of a single machine.
Sharding splits a database table across multiple servers (shards), each responsible for a subset of rows. The shard key determines which shard holds a given row — common choices are user ID, geographic region, or a hash. Sharding enables horizontal scaling for both storage and throughput but introduces complexity: cross-shard queries, distributed transactions, and rebalancing. It's typically a last resort after exhausting vertical scaling, read replicas, and caching.
Tradeoffs
Strengths
- Horizontal scalability: No single-machine ceiling for storage or throughput.
- Fault isolation: A shard failure affects only a fraction of users, not the entire system.
- Geographic data locality: Sharding by region keeps data close to users and satisfies compliance requirements.
- Independent scaling: Hot shards can be given more resources without scaling the entire cluster.
Weaknesses
- Operational complexity: More instances to manage, monitor, back up, and upgrade.
- Cross-shard query limitations: Joins and aggregations across shards are slow and complex.
- Rebalancing difficulty: Adding or removing shards requires data migration, which can be risky and slow.
- Application complexity: The application (or middleware) must route queries to the correct shard.
- Distributed transactions: ACID guarantees across shards require expensive coordination protocols.
- Irreversibility: Once sharded, it's extremely difficult to un-shard without significant downtime.
Likely Follow-Up Questions
- How would you handle a hotspot shard that receives disproportionate traffic?
- What is the difference between sharding and partitioning?
- How do you perform zero-downtime resharding?
- How would you generate globally unique IDs across shards?
- When would you choose a distributed SQL database over manual sharding?
- How do you handle cross-shard transactions without 2PC?
Source: editorial — Synthesized from Vitess documentation, Instagram/Pinterest/Notion engineering blogs, and distributed database literature.