0
0
Hadoopdata~20 mins

Pig Latin basics in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Pig Latin Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of a simple LOAD and FILTER operation
Given the following Pig Latin script, what is the output relation after the FILTER operation?

data = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, age:int);
adults = FILTER data BY age >= 18;

Assume input.txt contains:
John,17
Mary,22
Bob,18
Hadoop
data = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, age:int);
adults = FILTER data BY age >= 18;
DUMP adults;
A(Mary,22)
B
(John,17)
(Mary,22)
(Bob,18)
C(John,17)
D
(Mary,22)
(Bob,18)
Attempts:
2 left
💡 Hint
FILTER keeps only rows where the condition is true.
data_output
intermediate
2:00remaining
Result of a GROUP and COUNT operation
What is the output of the following Pig Latin script?

data = LOAD 'input.txt' USING PigStorage(',') AS (category:chararray, value:int);
grp = GROUP data BY category;
counted = FOREACH grp GENERATE group, COUNT(data);
DUMP counted;

Assume input.txt contains:
A,10
B,20
A,30
B,40
A,50
Hadoop
data = LOAD 'input.txt' USING PigStorage(',') AS (category:chararray, value:int);
grp = GROUP data BY category;
counted = FOREACH grp GENERATE group, COUNT(data);
DUMP counted;
A
(A,1)
(B,1)
B
(A,3)
(B,2)
C
(A,5)
(B,5)
D
(A,2)
(B,3)
Attempts:
2 left
💡 Hint
GROUP collects rows by category, COUNT counts rows per group.
🔧 Debug
advanced
2:00remaining
Identify the error in a JOIN operation
What error will this Pig Latin script produce?

users = LOAD 'users.txt' USING PigStorage(',') AS (user_id:int, name:chararray);
orders = LOAD 'orders.txt' USING PigStorage(',') AS (order_id:int, user:int, amount:float);
joined = JOIN users BY user_id, orders BY user_id;
DUMP joined;
Hadoop
users = LOAD 'users.txt' USING PigStorage(',') AS (user_id:int, name:chararray);
orders = LOAD 'orders.txt' USING PigStorage(',') AS (order_id:int, user:int, amount:float);
joined = JOIN users BY user_id, orders BY user_id;
DUMP joined;
AError: Field 'user_id' not found in relation 'orders'
BNo error, outputs joined data
CError: Syntax error near JOIN statement
DError: Duplicate field names in output
Attempts:
2 left
💡 Hint
Check field names used in JOIN keys.
visualization
advanced
2:00remaining
Visualize the result of a FOREACH GENERATE with arithmetic
Given this Pig Latin script, what is the output after the FOREACH GENERATE?

data = LOAD 'input.txt' USING PigStorage(',') AS (item:chararray, price:float, quantity:int);
total = FOREACH data GENERATE item, price * quantity AS total_cost;
DUMP total;

Assume input.txt contains:
Pen,1.5,10
Notebook,2.0,5
Eraser,0.5,20
Hadoop
data = LOAD 'input.txt' USING PigStorage(',') AS (item:chararray, price:float, quantity:int);
total = FOREACH data GENERATE item, price * quantity AS total_cost;
DUMP total;
A
(Pen,15.0)
(Notebook,10.0)
(Eraser,10.0)
B
(Pen,1.5)
(Notebook,2.0)
(Eraser,0.5)
C
(Pen,10)
(Notebook,5)
(Eraser,20)
D
(Pen,150)
(Notebook,200)
(Eraser,100)
Attempts:
2 left
💡 Hint
Multiply price by quantity for each item.
🚀 Application
expert
3:00remaining
Determine the number of records after a COGROUP and FILTER
Consider these Pig Latin commands:

students = LOAD 'students.txt' USING PigStorage(',') AS (student_id:int, name:chararray);
grades = LOAD 'grades.txt' USING PigStorage(',') AS (student_id:int, grade:int);
grouped = COGROUP students BY student_id, grades BY student_id;
passed = FILTER grouped BY COUNT(grades) > 0 AND AVG(grades.grade) >= 60;
DUMP passed;

Assuming:
students.txt:
1,Alice
2,Bob
3,Charlie
grades.txt:
1,70
1,80
2,50
3,90
3,55

How many records will passed contain?
Hadoop
students = LOAD 'students.txt' USING PigStorage(',') AS (student_id:int, name:chararray);
grades = LOAD 'grades.txt' USING PigStorage(',') AS (student_id:int, grade:int);
grouped = COGROUP students BY student_id, grades BY student_id;
passed = FILTER grouped BY COUNT(grades) > 0 AND AVG(grades.grade) >= 60;
DUMP passed;
A3
B1
C2
D0
Attempts:
2 left
💡 Hint
Check which students have grades and average grade >= 60.