Recall & Review
beginner
What is a shard in a distributed database system?
A shard is a horizontal partition of data in a database. Each shard holds a subset of the total data, allowing the system to scale by distributing data across multiple servers.
Click to reveal answer
intermediate
Why do cross-shard queries pose challenges in distributed systems?
Because data is split across multiple shards, querying data that spans shards requires coordination between shards, which can increase latency, complexity, and risk of inconsistent results.
Click to reveal answer
intermediate
Name two common strategies to handle cross-shard queries.
1. Scatter-gather: Query all relevant shards and combine results. 2. Global indexes: Maintain an index that points to data across shards to quickly locate data without querying all shards.
Click to reveal answer
beginner
What is the scatter-gather approach in cross-shard queries?
It is a method where the query is sent to all shards that might contain relevant data. Each shard processes the query locally and returns results. The system then merges these results to form the final answer.
Click to reveal answer
intermediate
How can global indexes improve cross-shard query performance?
Global indexes provide a centralized way to find which shard holds the data needed. This reduces the need to query all shards, lowering latency and resource use.
Click to reveal answer
What is the main reason cross-shard queries are slower than single-shard queries?
✗ Incorrect
Cross-shard queries involve multiple shards, requiring coordination and data merging, which adds overhead and latency.
Which approach involves querying all shards and combining results?
✗ Incorrect
Scatter-gather sends the query to all shards and merges their responses.
What is a drawback of maintaining global indexes for cross-shard queries?
✗ Incorrect
Global indexes need additional storage and must be kept up to date, adding complexity.
Which of the following is NOT a typical challenge of cross-shard queries?
✗ Incorrect
Cross-shard queries usually complicate query logic, not simplify it.
What does sharding primarily help with in databases?
✗ Incorrect
Sharding splits data across servers to scale storage and query capacity.
Explain what cross-shard queries are and why they are challenging in distributed databases.
Think about how data is split and how queries must gather data from multiple places.
You got /4 concepts.
Describe two strategies to handle cross-shard queries and their trade-offs.
Consider how queries find data and how results are combined.
You got /4 concepts.