0
0
Hadoopdata~10 mins

LOAD, FILTER, and STORE operations in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - LOAD, FILTER, and STORE operations
LOAD data from source
FILTER data by condition
STORE filtered data to destination
END
Data is first loaded from a source, then filtered by a condition, and finally stored to a destination.
Execution Sample
Hadoop
data = LOAD 'input.txt';
filtered = FILTER data BY age > 30;
STORE filtered INTO 'output';
Load data from 'input.txt', keep only rows where age is greater than 30, then save the result to 'output'.
Execution Table
StepOperationInput DataCondition/ActionOutput Data
1LOADNoneRead all rows from 'input.txt'[{'name':'Ann','age':25},{'name':'Bob','age':35},{'name':'Cara','age':40}]
2FILTER[{'name':'Ann','age':25},{'name':'Bob','age':35},{'name':'Cara','age':40}]Keep rows where age > 30[{'name':'Bob','age':35},{'name':'Cara','age':40}]
3STORE[{'name':'Bob','age':35},{'name':'Cara','age':40}]Write data to 'output'Data saved to 'output'
4ENDN/AAll steps completedProcess finished
💡 Process stops after storing filtered data to 'output'.
Variable Tracker
VariableStartAfter LOADAfter FILTERAfter STORE
dataNone[{'name':'Ann','age':25},{'name':'Bob','age':35},{'name':'Cara','age':40}][{'name':'Ann','age':25},{'name':'Bob','age':35},{'name':'Cara','age':40}][{'name':'Ann','age':25},{'name':'Bob','age':35},{'name':'Cara','age':40}]
filteredNoneNone[{'name':'Bob','age':35},{'name':'Cara','age':40}][{'name':'Bob','age':35},{'name':'Cara','age':40}]
Key Moments - 3 Insights
Why does the 'filtered' variable only contain some rows after the FILTER step?
Because FILTER keeps only rows where the condition 'age > 30' is true, as shown in execution_table step 2.
Does the original 'data' variable change after filtering?
No, 'data' remains the same after FILTER; filtering creates a new variable 'filtered' as seen in variable_tracker.
What happens if STORE is not called after FILTER?
The filtered data would not be saved anywhere; execution_table step 3 shows STORE writes data to 'output'.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the output data after the FILTER operation?
A[{'name':'Bob','age':35},{'name':'Cara','age':40}]
B[{'name':'Ann','age':25}]
C[{'name':'Ann','age':25},{'name':'Bob','age':35},{'name':'Cara','age':40}]
D[]
💡 Hint
Check the 'Output Data' column at Step 2 in the execution_table.
At which step does the data get saved to the destination?
AStep 2
BStep 3
CStep 1
DStep 4
💡 Hint
Look for the STORE operation in the execution_table.
If the FILTER condition changed to 'age > 40', what would be the output after filtering?
A[{'name':'Bob','age':35}]
B[{'name':'Cara','age':40}]
C[]
D[{'name':'Ann','age':25}]
💡 Hint
Refer to how FILTER selects rows based on the condition in execution_table step 2.
Concept Snapshot
LOAD reads data from a source file.
FILTER keeps only rows matching a condition.
STORE saves the filtered data to a destination.
Each step produces new data without changing original.
Used together to process and save specific data subsets.
Full Transcript
This lesson shows how Hadoop LOAD, FILTER, and STORE operations work step-by-step. First, data is loaded from a file, reading all rows. Then, FILTER keeps only rows where the age is greater than 30. Finally, STORE saves the filtered data to an output location. Variables 'data' and 'filtered' track the data before and after filtering. The process stops after storing the filtered data. Key points include that FILTER creates a new dataset without changing the original, and STORE writes data to disk. The visual quiz tests understanding of outputs at each step and effects of changing filter conditions.