0
0
HLDsystem_design~7 mins

Shard key selection in HLD - System Design Guide

Choose your learning style9 modes available
Problem Statement
When a database grows too large, a single server cannot handle all the data or requests. Without a good way to split data, some servers get overloaded while others stay idle, causing slow responses and failures.
Solution
Shard key selection divides data into parts based on a chosen key so each server stores only a portion. This key decides where data goes, balancing load and making queries efficient by directing them to the right server.
Architecture
Client App
Client App
Shard Router
Shard Router
Shard1

This diagram shows a client sending requests to a shard router, which uses the shard key to direct each request to the correct shard server holding the relevant data.

Trade-offs
✓ Pros
Distributes data evenly to avoid hotspots and overload on any single server.
Improves query speed by routing requests directly to the shard containing the data.
Enables horizontal scaling by adding more shards as data grows.
Reduces single points of failure by isolating data partitions.
✗ Cons
Choosing a poor shard key can cause uneven data distribution and hotspots.
Changing shard keys later is complex and may require data reshuffling.
Cross-shard queries become more complicated and slower.
Requires careful planning to balance load and query patterns.
Use when data size or request volume exceeds the capacity of a single database server, typically above millions of records or thousands of requests per second.
Avoid if data size is small or query patterns require frequent cross-shard joins, as complexity and overhead outweigh benefits.
Real World Examples
Twitter
Twitter uses user ID as a shard key to distribute tweets across multiple servers, ensuring balanced load and fast access to user timelines.
Amazon
Amazon shards product catalog data by product category to distribute storage and query load efficiently across servers.
Uber
Uber shards ride data by geographic region to localize data access and reduce latency for regional queries.
Alternatives
Range-based sharding
Splits data by continuous ranges of shard key values rather than hashing them.
Use when: Choose when queries often target specific ranges and data distribution is predictable.
Directory-based sharding
Uses a lookup table to map data keys to shards instead of computing shard location from the key.
Use when: Choose when shard key distribution is uneven or dynamic and requires flexible mapping.
No sharding (single database)
Stores all data on one server without partitioning.
Use when: Choose when data size and traffic are low and simplicity is preferred.
Summary
Shard key selection decides how data is split across multiple servers to balance load and improve performance.
A good shard key evenly distributes data and aligns with common query patterns to avoid hotspots.
Poor shard key choices cause uneven load, complex queries, and difficult scaling.