Hadoopdata~10 mins

Why HBase provides real-time access to big data in Hadoop - Visual Breakdown

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Why HBase provides real-time access to big data

Data Stored in HDFS

↓

HBase Stores Data in Tables

↓

Data Organized by Row Key

↓

Fast Lookup Using Indexes

↓

Real-Time Read/Write Access

↓

Applications Get Instant Data

HBase stores big data in tables on HDFS, organizes it by row keys for fast lookup, enabling real-time read and write access.

Execution Sample

Hadoop

Put 'row1', 'col1', 'val1' into HBase table
Get 'row1', 'col1' from HBase table
Return value immediately

This shows how HBase writes and reads data instantly by row key and column.

Execution Table

Step	Action	Data Location	Result	Explanation
1	Write 'val1' to row 'row1', column 'col1'	HBase MemStore	Data stored in memory	Data first goes to fast memory store for quick write
2	Flush MemStore to HDFS as HFile	HDFS	Data persisted on disk	Data is saved to disk in sorted files for durability
3	Read 'col1' from 'row1'	MemStore + HFiles	Return 'val1'	HBase checks memory and disk files to find data fast
4	Write 'val2' to 'row2', 'col1'	MemStore	Data stored in memory	New data again goes to memory for fast write
5	Read 'col1' from 'row2'	MemStore	Return 'val2'	Data found immediately in memory without disk access
6	Read 'col1' from 'row3'	MemStore + HFiles	Return null	No data found for this row, returns empty quickly

💡 Execution stops after reads and writes show instant access via memory and disk indexing

Variable Tracker

Variable	Start	After Step 1	After Step 2	After Step 4	After Step 5	Final
MemStore	empty	{row1: {col1: val1}}	empty (flushed)	{row2: {col1: val2}}	{row2: {col1: val2}}	{row2: {col1: val2}}
HFiles on HDFS	empty	contains {row1: {col1: val1}}	contains {row1: {col1: val1}}	contains {row1: {col1: val1}}	contains {row1: {col1: val1}}	contains {row1: {col1: val1}}

Key Moments - 3 Insights

Why does HBase write data first to MemStore before saving to disk?

How does HBase read data quickly even if it is stored on disk?

What happens if we try to read data for a row that does not exist?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, what is stored in MemStore after step 1?

A{row2: {col1: val2}}

Bempty

C{row1: {col1: val1}}

Dcontains {row1: {col1: val1}} on disk

Concept Snapshot

HBase stores big data in tables on HDFS.
Data is first written to MemStore (memory) for fast access.
MemStore flushes to HDFS as sorted HFiles for durability.
Reads check MemStore then HFiles for quick lookup.
This design enables real-time read/write access to big data.

Full Transcript

HBase provides real-time access to big data by storing data in tables on top of HDFS. When data is written, it first goes into a memory area called MemStore, which allows very fast writes and immediate availability. Periodically, this data is saved to disk in files called HFiles for durability. When reading data, HBase looks first in MemStore and then in HFiles on disk, using indexes to find data quickly. This combination of memory and disk storage with indexing allows HBase to serve real-time read and write requests efficiently, even with very large datasets.