0
0
Hadoopdata~5 mins

LOAD, FILTER, and STORE operations in Hadoop

Choose your learning style9 modes available
Introduction

We use LOAD to bring data into Hadoop, FILTER to pick only the data we want, and STORE to save the results. This helps us work with big data step-by-step.

When you want to read data from a file in Hadoop to analyze it.
When you need to keep only certain rows from a large dataset based on a condition.
When you want to save the filtered or processed data back to Hadoop storage.
When cleaning data by removing unwanted records before further analysis.
When preparing data for another program or step by saving the results.
Syntax
Hadoop
data = LOAD 'input_path' USING PigStorage(',');
filtered_data = FILTER data BY condition;
STORE filtered_data INTO 'output_path' USING PigStorage(',');

LOAD reads data from a file or folder.

FILTER keeps rows where the condition is true.

Examples
Load a CSV file named users.csv into a variable called data.
Hadoop
data = LOAD 'users.csv' USING PigStorage(',');
Keep only rows where the age column is greater than 30.
Hadoop
filtered_data = FILTER data BY age > 30;
Save the filtered data into a folder named users_over_30.
Hadoop
STORE filtered_data INTO 'output/users_over_30' USING PigStorage(',');
Sample Program

This program loads a CSV file with user data, filters users older than 30, and stores the result.

Hadoop
data = LOAD 'input/users.csv' USING PigStorage(',');
filtered_data = FILTER data BY (int)$1 > 30;
STORE filtered_data INTO 'output/users_over_30' USING PigStorage(',');
OutputSuccess
Important Notes

Make sure the input path exists and is accessible in Hadoop.

FILTER conditions must match the data types; cast if needed.

STORE will create a folder; if it exists, Hadoop may throw an error.

Summary

LOAD brings data into Hadoop for processing.

FILTER selects only the rows you want based on a condition.

STORE saves your processed data back to Hadoop storage.