beginner

What is a row key in Hadoop's HBase?

A row key is a unique identifier for each row in an HBase table. It helps locate and retrieve data quickly.

Click to reveal answer

beginner

Why is choosing a good row key important?

A good row key improves data access speed and balances load across servers. Poor keys can cause hotspots and slow queries.

Click to reveal answer

intermediate

What is a common problem with sequential row keys?

Sequential row keys can cause hotspots because all writes go to the same region server, leading to slow performance.

Click to reveal answer

intermediate

Name two strategies to avoid hotspotting in row key design.

1. Salting: Add a random prefix to row keys to spread writes. 2. Hashing: Use a hash of the key to distribute data evenly.

Click to reveal answer

intermediate

How can time-based data be stored efficiently in row keys?

Use reversed timestamps in row keys to keep recent data together and avoid hotspots by spreading writes over time.

Click to reveal answer

What happens if you use purely sequential row keys in HBase?

AData is evenly distributed across servers

BData is stored in random order

CQueries become faster automatically

DHotspotting occurs on one region server

Which technique helps spread writes evenly by adding a prefix to row keys?

ASalting

BCompression

CIndexing

DPartitioning

Why use reversed timestamps in row keys for time-series data?

ATo sort data from oldest to newest

BTo keep recent data together and avoid hotspots

CTo make keys shorter

DTo encrypt the data

What is the main goal of good row key design?

AMaximize storage space

BReduce network traffic

CImprove data retrieval speed and balance load

DMake keys human-readable

Which of these is NOT a row key design strategy?

ASequential numbering

BSalting

CHashing

DReversed timestamp

Explain why hotspotting happens with sequential row keys and how to prevent it.

Describe how to design row keys for time-series data in HBase.