0
0
Hadoopdata~10 mins

Why HBase provides real-time access to big data in Hadoop - Visual Breakdown

Choose your learning style9 modes available
Concept Flow - Why HBase provides real-time access to big data
Data Stored in HDFS
HBase Stores Data in Tables
Data Organized by Row Key
Fast Lookup Using Indexes
Real-Time Read/Write Access
Applications Get Instant Data
HBase stores big data in tables on HDFS, organizes it by row keys for fast lookup, enabling real-time read and write access.
Execution Sample
Hadoop
Put 'row1', 'col1', 'val1' into HBase table
Get 'row1', 'col1' from HBase table
Return value immediately
This shows how HBase writes and reads data instantly by row key and column.
Execution Table
StepActionData LocationResultExplanation
1Write 'val1' to row 'row1', column 'col1'HBase MemStoreData stored in memoryData first goes to fast memory store for quick write
2Flush MemStore to HDFS as HFileHDFSData persisted on diskData is saved to disk in sorted files for durability
3Read 'col1' from 'row1'MemStore + HFilesReturn 'val1'HBase checks memory and disk files to find data fast
4Write 'val2' to 'row2', 'col1'MemStoreData stored in memoryNew data again goes to memory for fast write
5Read 'col1' from 'row2'MemStoreReturn 'val2'Data found immediately in memory without disk access
6Read 'col1' from 'row3'MemStore + HFilesReturn nullNo data found for this row, returns empty quickly
💡 Execution stops after reads and writes show instant access via memory and disk indexing
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 4After Step 5Final
MemStoreempty{row1: {col1: val1}}empty (flushed){row2: {col1: val2}}{row2: {col1: val2}}{row2: {col1: val2}}
HFiles on HDFSemptycontains {row1: {col1: val1}}contains {row1: {col1: val1}}contains {row1: {col1: val1}}contains {row1: {col1: val1}}contains {row1: {col1: val1}}
Key Moments - 3 Insights
Why does HBase write data first to MemStore before saving to disk?
Writing to MemStore (memory) is fast and allows immediate data availability, as shown in steps 1 and 4 in the execution_table.
How does HBase read data quickly even if it is stored on disk?
HBase uses indexes and checks MemStore first, then HFiles on disk, enabling fast lookup as seen in steps 3 and 5.
What happens if we try to read data for a row that does not exist?
HBase quickly returns null after checking memory and disk, demonstrated in step 6.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is stored in MemStore after step 1?
A{row2: {col1: val2}}
Bempty
C{row1: {col1: val1}}
Dcontains {row1: {col1: val1}} on disk
💡 Hint
Check the 'Data Location' and 'Result' columns for step 1 in execution_table.
At which step does data get saved permanently to disk?
AStep 2
BStep 1
CStep 4
DStep 5
💡 Hint
Look for 'Flush MemStore to HDFS as HFile' action in execution_table.
If MemStore is empty, where does HBase look for data during read?
AOnly MemStore
BMemStore and HDFS HFiles
COnly HDFS HFiles
DIt cannot find data
💡 Hint
See step 3 in execution_table where both MemStore and HFiles are checked.
Concept Snapshot
HBase stores big data in tables on HDFS.
Data is first written to MemStore (memory) for fast access.
MemStore flushes to HDFS as sorted HFiles for durability.
Reads check MemStore then HFiles for quick lookup.
This design enables real-time read/write access to big data.
Full Transcript
HBase provides real-time access to big data by storing data in tables on top of HDFS. When data is written, it first goes into a memory area called MemStore, which allows very fast writes and immediate availability. Periodically, this data is saved to disk in files called HFiles for durability. When reading data, HBase looks first in MemStore and then in HFiles on disk, using indexes to find data quickly. This combination of memory and disk storage with indexing allows HBase to serve real-time read and write requests efficiently, even with very large datasets.