Imagine you have a large database storing user profiles. You want to split this data across multiple servers (shards) to improve performance. Which shard key choice will most evenly distribute the data?
Think about which key has the most randomness and uniqueness to spread data evenly.
A random UUID as a shard key spreads data evenly because it has high cardinality and randomness. Country code or last name initial have limited values causing uneven distribution. Signup date can cause hotspots if many users sign up on the same day.
You design a product catalog system where users often search products by category and price range. Which shard key will optimize query speed for these searches?
Consider which shard key aligns with common query filters to reduce cross-shard queries.
Sharding by product category groups similar products together, so queries filtering by category hit fewer shards. Price or creation timestamp are continuous values causing range queries to span many shards. Product ID is unique but unrelated to query patterns.
A social media app uses user ID as the shard key. Suddenly, a celebrity user causes a hotspot with millions of requests to their shard. What is the best architectural approach to reduce this hotspot?
Think about reducing load on the hotspot shard without major data reshuffling.
Caching popular user data reduces direct database hits on the hotspot shard. Changing shard key or increasing shards alone won't solve the hotspot because the celebrity's data is concentrated. Composite keys add complexity but don't reduce hotspot load effectively.
What is a major downside of using a monotonically increasing value (like timestamp or auto-increment ID) as a shard key?
Consider how new data is assigned to shards over time.
Monotonically increasing shard keys cause new data to be written to the same shard, creating hotspots and uneven load. This reduces write scalability. It does not inherently increase latency, prevent replication, or block scaling.
You expect 1 billion user records growing by 10 million per month. Each shard can handle 100 million records efficiently. How many shards do you need initially and after 6 months?
Divide total records by shard capacity to estimate shard count.
Initially: 1,000,000,000 / 100,000,000 = 10 shards.
After 6 months: 1,000,000,000 + (6 * 10,000,000) = 1,060,000,000 records.
1,060,000,000 / 100,000,000 ≈ 10.6 → 11 shards minimally. Option B (10 initial, 16 after) correctly matches the initial count and provides overprovisioning for safety, ongoing growth, and inefficiencies.