0
0
Hadoopdata~5 mins

HBase data model (column families) in Hadoop

Choose your learning style9 modes available
Introduction

Column families group related data together in HBase. They help organize and store data efficiently.

When you want to store different types of data separately but in the same table.
When you need to control how data is stored and accessed for performance.
When you want to apply different settings like compression or versioning to groups of columns.
When designing a schema that reflects real-world categories of data.
When you want to optimize read and write operations by grouping frequently accessed columns.
Syntax
Hadoop
create 'table_name', 'column_family1', 'column_family2'

Column families are defined when creating a table and group columns logically.

Each column family can contain many columns, but all columns in a family share storage settings.

Examples
This table has two column families: info and stats. User name and email are in info, login count is in stats.
Hadoop
create 'users', 'info', 'stats'
Here, details holds product info, and inventory holds stock data.
Hadoop
create 'products', 'details', 'inventory'
Sample Program

This example creates an HBase table named employees with two column families: personal and work. It adds data for one employee and then scans the table to show the stored data.

Hadoop
create 'employees', 'personal', 'work'

put 'employees', 'emp1', 'personal:name', 'Alice'
put 'employees', 'emp1', 'personal:email', 'alice@example.com'
put 'employees', 'emp1', 'work:department', 'Sales'

scan 'employees'
OutputSuccess
Important Notes

All columns in a column family are stored together on disk, which improves read/write speed for those columns.

Too many column families can slow down performance; keep them few and meaningful.

Column families cannot be added or removed easily after table creation, so plan carefully.

Summary

Column families group related columns in HBase tables.

They help organize data and control storage settings.

Use them to improve performance and reflect real-world data categories.