0
0
HLDsystem_design~25 mins

Shard key selection in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Shard Key Selection for Distributed Database
Design focuses on selecting an optimal shard key and its impact on data distribution, query efficiency, and scalability. It excludes detailed shard rebalancing algorithms and physical hardware setup.
Functional Requirements
FR1: Distribute data evenly across multiple database shards
FR2: Support efficient query routing to minimize cross-shard queries
FR3: Maintain high availability and fault tolerance
FR4: Allow scalability to handle up to 1 billion records
FR5: Support common query patterns such as point lookups and range queries
Non-Functional Requirements
NFR1: Shard key must minimize data skew and hotspots
NFR2: Latency for queries should be under 200ms p99
NFR3: System should tolerate shard failures without data loss
NFR4: Shard key selection should not require frequent re-sharding
NFR5: Support strong consistency within a shard
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
Key Components
Shard key selection logic
Routing layer to direct queries to correct shard
Shard metadata service to track shard locations
Data partitioning and replication mechanisms
Monitoring and alerting for shard health and load
Design Patterns
Range-based sharding
Hash-based sharding
Directory-based sharding
Composite shard keys
Consistent hashing
Reference Architecture
Client
  |
  v
Query Router (Shard Key Extractor)
  |
  v
+-------------------+     +-------------------+     +-------------------+
|    Shard 1        |     |    Shard 2        | ... |    Shard N        |
| (Data Partition 1)|     | (Data Partition 2)|     | (Data Partition N)|
+-------------------+     +-------------------+     +-------------------+
Components
Query Router
Custom routing service
Extract shard key from queries and route requests to the correct shard
Shard Metadata Service
Distributed key-value store (e.g., etcd, ZooKeeper)
Maintain mapping of shard keys to shard locations
Database Shards
Distributed database instances (e.g., MongoDB, Cassandra)
Store partitioned data based on shard key
Shard Key Selection Logic
Application logic or middleware
Determine optimal shard key based on data and query patterns
Request Flow
1. Client sends a query with a shard key field.
2. Query Router extracts the shard key from the query.
3. Router consults Shard Metadata Service to find the shard responsible for the key.
4. Router forwards the query to the appropriate database shard.
5. Shard processes the query and returns the result to the router.
6. Router sends the response back to the client.
Database Schema
Entities: - User (user_id PK, name, email, region) - Orders (order_id PK, user_id FK, product_id, timestamp) Relationships: - User to Orders: 1:N Shard Key Candidates: - user_id (hash-based sharding for even distribution) - region (range-based sharding for locality) Chosen Shard Key: - Composite key of (region, user_id) to balance locality and distribution
Scaling Discussion
Bottlenecks
Uneven data distribution causing hotspots on certain shards
Cross-shard queries increasing latency and complexity
Shard metadata service becoming a single point of failure
Re-sharding complexity when scaling beyond initial shard count
Solutions
Use consistent hashing or composite shard keys to balance load
Design queries to minimize cross-shard operations or use scatter-gather with caching
Deploy shard metadata service in a highly available cluster with leader election
Implement online re-sharding with minimal downtime and data migration tools
Interview Tips
Time: Spend 10 minutes understanding requirements and clarifying queries, 20 minutes designing shard key selection and architecture, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain importance of shard key in data distribution and query efficiency
Discuss trade-offs between hash-based and range-based sharding
Highlight how shard key affects cross-shard queries and latency
Mention strategies to handle hotspots and re-sharding
Show awareness of fault tolerance and metadata management