0
0
Hadoopdata~20 mins

LOAD, FILTER, and STORE operations in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Pig Latin Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this Pig Latin script after filtering?

Given the following Pig Latin script, what will be the content of filtered_data?

data = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, age:int, city:chararray);
filtered_data = FILTER data BY age > 30;
DUMP filtered_data;

Assume input.txt contains:
John,25,New York
Mary,35,Chicago
Bob,40,Seattle

Hadoop
data = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, age:int, city:chararray);
filtered_data = FILTER data BY age > 30;
DUMP filtered_data;
A(John,25,New York)
B
(John,25,New York)
(Mary,35,Chicago)
(Bob,40,Seattle)
C
(Mary,35,Chicago)
(Bob,40,Seattle)
DEmpty output
Attempts:
2 left
💡 Hint

Think about the filter condition age > 30 and which rows satisfy it.

data_output
intermediate
1:30remaining
How many records remain after filtering?

Consider this Pig Latin script:

records = LOAD 'data.csv' USING PigStorage(',') AS (id:int, score:int);
passed = FILTER records BY score >= 50;
STORE passed INTO 'passed_output';

If data.csv has 5 records with scores: 45, 50, 60, 30, 55, how many records will be stored in passed_output?

Hadoop
records = LOAD 'data.csv' USING PigStorage(',') AS (id:int, score:int);
passed = FILTER records BY score >= 50;
STORE passed INTO 'passed_output';
A3
B2
C5
D0
Attempts:
2 left
💡 Hint

Count how many scores are 50 or more.

🔧 Debug
advanced
2:00remaining
Identify the error in this Pig Latin script

What error will this Pig Latin script produce?

data = LOAD 'file.txt' USING PigStorage(',') AS (name:chararray, age:int);
filtered = FILTER data BY age > '30';
DUMP filtered;
Hadoop
data = LOAD 'file.txt' USING PigStorage(',') AS (name:chararray, age:int);
filtered = FILTER data BY age > '30';
DUMP filtered;
ASyntaxError: Invalid comparison operator
BRuntimeError: File not found
CNo error, runs successfully
DTypeError: Cannot compare int with chararray
Attempts:
2 left
💡 Hint

Check the data types used in the filter condition.

🚀 Application
advanced
2:00remaining
Which Pig Latin command stores filtered data correctly?

You want to load a CSV file, filter rows where status is 'active', and save the result. Which command correctly stores the filtered data?

data = LOAD 'users.csv' USING PigStorage(',') AS (user_id:int, status:chararray);
active_users = FILTER data BY status == 'active';
Hadoop
data = LOAD 'users.csv' USING PigStorage(',') AS (user_id:int, status:chararray);
active_users = FILTER data BY status == 'active';
ASTORE active_users INTO 'active_output' USING PigStorage();
BSTORE data INTO 'active_output' USING PigStorage();
CSTORE active_users INTO 'active_output';
DSTORE active_users INTO 'active_output' USING JsonStorage();
Attempts:
2 left
💡 Hint

Remember to specify the storage format when using STORE.

🧠 Conceptual
expert
2:30remaining
What happens if you STORE a relation without filtering after LOAD?

Consider this Pig Latin script:

data = LOAD 'records.txt' USING PigStorage(',') AS (id:int, value:int);
STORE data INTO 'output_dir' USING PigStorage();

What will be the content of output_dir?

Hadoop
data = LOAD 'records.txt' USING PigStorage(',') AS (id:int, value:int);
STORE data INTO 'output_dir' USING PigStorage();
AOnly records with non-null values will be stored.
BAll records from 'records.txt' will be stored in 'output_dir'.
CNo records will be stored because no FILTER was applied.
DAn error will occur because STORE requires a FILTER operation first.
Attempts:
2 left
💡 Hint

Think about what STORE does with a loaded relation.