Recall & Review
beginner
What is a row key in Hadoop's HBase?
A row key is a unique identifier for each row in an HBase table. It helps locate and retrieve data quickly.
Click to reveal answer
beginner
Why is choosing a good row key important?
A good row key improves data access speed and balances load across servers. Poor keys can cause hotspots and slow queries.
Click to reveal answer
intermediate
What is a common problem with sequential row keys?
Sequential row keys can cause hotspots because all writes go to the same region server, leading to slow performance.
Click to reveal answer
intermediate
Name two strategies to avoid hotspotting in row key design.
1. Salting: Add a random prefix to row keys to spread writes. 2. Hashing: Use a hash of the key to distribute data evenly.
Click to reveal answer
intermediate
How can time-based data be stored efficiently in row keys?
Use reversed timestamps in row keys to keep recent data together and avoid hotspots by spreading writes over time.
Click to reveal answer
What happens if you use purely sequential row keys in HBase?
✗ Incorrect
Sequential keys cause all writes to go to the same region server, creating a hotspot.
Which technique helps spread writes evenly by adding a prefix to row keys?
✗ Incorrect
Salting adds a random prefix to row keys to distribute data across regions.
Why use reversed timestamps in row keys for time-series data?
✗ Incorrect
Reversed timestamps keep recent data grouped and spread writes over time.
What is the main goal of good row key design?
✗ Incorrect
Good row keys speed up access and prevent server overload.
Which of these is NOT a row key design strategy?
✗ Incorrect
Sequential numbering often causes hotspots and is not recommended.
Explain why hotspotting happens with sequential row keys and how to prevent it.
Think about how data is stored and accessed on servers.
You got /3 concepts.
Describe how to design row keys for time-series data in HBase.
Consider how time affects data order and access patterns.
You got /3 concepts.