What is HBase data model (column families) in Hadoop?

Hadoopdata~5 mins

HBase data model (column families) in Hadoop

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Column families group related data together in HBase. They help organize and store data efficiently.

When you want to store different types of data separately but in the same table.

When you need to control how data is stored and accessed for performance.

When you want to apply different settings like compression or versioning to groups of columns.

When designing a schema that reflects real-world categories of data.

When you want to optimize read and write operations by grouping frequently accessed columns.

Syntax

Hadoop

create 'table_name', 'column_family1', 'column_family2'

Column families are defined when creating a table and group columns logically.

Each column family can contain many columns, but all columns in a family share storage settings.

Examples

This table has two column families: info and stats. User name and email are in info, login count is in stats.

Hadoop

create 'users', 'info', 'stats'

Here, details holds product info, and inventory holds stock data.

Hadoop

create 'products', 'details', 'inventory'

Sample Program

This example creates an HBase table named employees with two column families: personal and work. It adds data for one employee and then scans the table to show the stored data.

Hadoop

create 'employees', 'personal', 'work'

put 'employees', 'emp1', 'personal:name', 'Alice'
put 'employees', 'emp1', 'personal:email', 'alice@example.com'
put 'employees', 'emp1', 'work:department', 'Sales'

scan 'employees'

OutputSuccess

Important Notes

All columns in a column family are stored together on disk, which improves read/write speed for those columns.

Too many column families can slow down performance; keep them few and meaningful.

Column families cannot be added or removed easily after table creation, so plan carefully.

Summary

Column families group related columns in HBase tables.

They help organize data and control storage settings.

Use them to improve performance and reflect real-world data categories.