0
0
Hadoopdata~30 mins

LOAD, FILTER, and STORE operations in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available
LOAD, FILTER, and STORE operations
📖 Scenario: You work with a large dataset of customer orders stored in Hadoop. You want to load this data, filter orders with amounts greater than 100, and save the filtered results for further analysis.
🎯 Goal: Build a Hadoop Pig script that loads the orders data, filters orders with amount greater than 100, and stores the filtered data into a new location.
📋 What You'll Learn
Load data from '/data/orders' with fields order_id, customer_id, and amount
Create a filter condition to keep only orders where amount > 100
Store the filtered results into '/data/filtered_orders'
💡 Why This Matters
🌍 Real World
Filtering large datasets in Hadoop is common for preparing data for analysis or reporting.
💼 Career
Data engineers and analysts use LOAD, FILTER, and STORE operations daily to manage big data pipelines.
Progress0 / 4 steps
1
Load the orders data
Write a Pig Latin statement to load the data from '/data/orders' into a relation called orders. The data has three fields: order_id, customer_id, and amount.
Hadoop
Need a hint?

Use LOAD with PigStorage and define the schema with AS.

2
Filter orders with amount greater than 100
Create a new relation called filtered_orders by filtering orders to keep only rows where amount > 100.
Hadoop
Need a hint?

Use FILTER with the condition amount > 100.

3
Store the filtered orders
Write a statement to store the filtered_orders relation into the directory '/data/filtered_orders' using PigStorage with comma separator.
Hadoop
Need a hint?

Use STORE with PigStorage to save the filtered data.

4
Display the filtered orders
Write a statement to dump the filtered_orders relation to display the filtered data on the console.
Hadoop
Need a hint?

Use DUMP to print the filtered data.