Challenge - 5 Problems
Pig Latin Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of a simple LOAD and FILTER operation
Given the following Pig Latin script, what is the output relation after the FILTER operation?
Assume
data = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, age:int);
adults = FILTER data BY age >= 18;Assume
input.txt contains:John,17
Mary,22
Bob,18
Hadoop
data = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, age:int); adults = FILTER data BY age >= 18; DUMP adults;
Attempts:
2 left
💡 Hint
FILTER keeps only rows where the condition is true.
✗ Incorrect
The FILTER keeps only rows where age is 18 or more, so John (17) is excluded.
❓ data_output
intermediate2:00remaining
Result of a GROUP and COUNT operation
What is the output of the following Pig Latin script?
Assume
data = LOAD 'input.txt' USING PigStorage(',') AS (category:chararray, value:int);
grp = GROUP data BY category;
counted = FOREACH grp GENERATE group, COUNT(data);
DUMP counted;Assume
input.txt contains:A,10
B,20
A,30
B,40
A,50
Hadoop
data = LOAD 'input.txt' USING PigStorage(',') AS (category:chararray, value:int); grp = GROUP data BY category; counted = FOREACH grp GENERATE group, COUNT(data); DUMP counted;
Attempts:
2 left
💡 Hint
GROUP collects rows by category, COUNT counts rows per group.
✗ Incorrect
Category A appears 3 times, B appears 2 times, so counts are (A,3) and (B,2).
🔧 Debug
advanced2:00remaining
Identify the error in a JOIN operation
What error will this Pig Latin script produce?
users = LOAD 'users.txt' USING PigStorage(',') AS (user_id:int, name:chararray);
orders = LOAD 'orders.txt' USING PigStorage(',') AS (order_id:int, user:int, amount:float);
joined = JOIN users BY user_id, orders BY user_id;
DUMP joined;Hadoop
users = LOAD 'users.txt' USING PigStorage(',') AS (user_id:int, name:chararray); orders = LOAD 'orders.txt' USING PigStorage(',') AS (order_id:int, user:int, amount:float); joined = JOIN users BY user_id, orders BY user_id; DUMP joined;
Attempts:
2 left
💡 Hint
Check field names used in JOIN keys.
✗ Incorrect
The 'orders' relation has a field named 'user', not 'user_id', so JOIN fails with field not found error.
❓ visualization
advanced2:00remaining
Visualize the result of a FOREACH GENERATE with arithmetic
Given this Pig Latin script, what is the output after the FOREACH GENERATE?
Assume
data = LOAD 'input.txt' USING PigStorage(',') AS (item:chararray, price:float, quantity:int);
total = FOREACH data GENERATE item, price * quantity AS total_cost;
DUMP total;Assume
input.txt contains:Pen,1.5,10
Notebook,2.0,5
Eraser,0.5,20
Hadoop
data = LOAD 'input.txt' USING PigStorage(',') AS (item:chararray, price:float, quantity:int); total = FOREACH data GENERATE item, price * quantity AS total_cost; DUMP total;
Attempts:
2 left
💡 Hint
Multiply price by quantity for each item.
✗ Incorrect
Pen: 1.5*10=15.0, Notebook: 2.0*5=10.0, Eraser: 0.5*20=10.0
🚀 Application
expert3:00remaining
Determine the number of records after a COGROUP and FILTER
Consider these Pig Latin commands:
Assuming:
How many records will
students = LOAD 'students.txt' USING PigStorage(',') AS (student_id:int, name:chararray);
grades = LOAD 'grades.txt' USING PigStorage(',') AS (student_id:int, grade:int);
grouped = COGROUP students BY student_id, grades BY student_id;
passed = FILTER grouped BY COUNT(grades) > 0 AND AVG(grades.grade) >= 60;
DUMP passed;Assuming:
students.txt:
1,Alice
2,Bob
3,Charlie
grades.txt:
1,70
1,80
2,50
3,90
3,55
How many records will
passed contain?Hadoop
students = LOAD 'students.txt' USING PigStorage(',') AS (student_id:int, name:chararray); grades = LOAD 'grades.txt' USING PigStorage(',') AS (student_id:int, grade:int); grouped = COGROUP students BY student_id, grades BY student_id; passed = FILTER grouped BY COUNT(grades) > 0 AND AVG(grades.grade) >= 60; DUMP passed;
Attempts:
2 left
💡 Hint
Check which students have grades and average grade >= 60.
✗ Incorrect
Student 1: grades 70,80 avg=75 (passes); Student 2: grade 50 avg=50 (fails); Student 3: grades 90,55 avg=72.5 (passes). So 2 records pass.