0
0
Hadoopdata~5 mins

HBase data model (column families) in Hadoop - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is a column family in HBase?
A column family in HBase is a group of columns that are stored together physically. It helps organize data and improves read/write efficiency by grouping related columns.
Click to reveal answer
beginner
How does HBase store data within column families?
HBase stores all columns of a column family together on disk. This means data in the same column family is stored in the same place, making access faster for those columns.
Click to reveal answer
intermediate
Why should you keep the number of column families small in HBase?
Because each column family is stored separately, having many column families can slow down performance and increase storage overhead. It's best to keep column families few and group related columns.
Click to reveal answer
beginner
Explain the relationship between rows, column families, and columns in HBase.
In HBase, data is organized as rows identified by a row key. Each row contains one or more column families. Each column family contains multiple columns. This hierarchy helps organize and access data efficiently.
Click to reveal answer
intermediate
Can you add new columns to an existing column family in HBase without changing the schema?
Yes. HBase allows adding new columns dynamically within an existing column family without changing the table schema. This flexibility is a key feature of HBase's data model.
Click to reveal answer
What does a column family in HBase group together?
AIndexes for faster search
BRows with similar keys
CTables with the same schema
DRelated columns stored together physically
Why is it recommended to keep the number of column families small in HBase?
ABecause each column family requires separate storage and too many slow down performance
BBecause column families cannot have more than 3 columns
CBecause column families are deleted automatically if too many exist
DBecause column families are only used for indexing
In HBase, what is the smallest unit of data storage?
ATable
BColumn family
CCell (intersection of row and column)
DRegion
Can you add new columns to an existing column family without changing the schema in HBase?
AYes, columns can be added dynamically within a column family
BNo, schema must be updated first
COnly if the table is recreated
DOnly during off-peak hours
How are column families stored in HBase?
AAll column families are stored together in one file
BEach column family is stored separately on disk
CColumn families are stored in memory only
DColumn families are stored as separate tables
Describe how data is organized in HBase using rows, column families, and columns.
Think about the hierarchy from row to column family to column.
You got /4 concepts.
    Explain why it is important to limit the number of column families in an HBase table.
    Consider how storage and performance relate to column families.
    You got /4 concepts.