Column families group related data together in HBase. They help organize and store data efficiently.
HBase data model (column families) in Hadoop
create 'table_name', 'column_family1', 'column_family2'
Column families are defined when creating a table and group columns logically.
Each column family can contain many columns, but all columns in a family share storage settings.
create 'users', 'info', 'stats'
create 'products', 'details', 'inventory'
This example creates an HBase table named employees with two column families: personal and work. It adds data for one employee and then scans the table to show the stored data.
create 'employees', 'personal', 'work' put 'employees', 'emp1', 'personal:name', 'Alice' put 'employees', 'emp1', 'personal:email', 'alice@example.com' put 'employees', 'emp1', 'work:department', 'Sales' scan 'employees'
All columns in a column family are stored together on disk, which improves read/write speed for those columns.
Too many column families can slow down performance; keep them few and meaningful.
Column families cannot be added or removed easily after table creation, so plan carefully.
Column families group related columns in HBase tables.
They help organize data and control storage settings.
Use them to improve performance and reflect real-world data categories.