Challenge - 5 Problems

🎖️

Pig Data Transformation Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why does Pig Latin simplify data transformation compared to raw MapReduce?

Choose the main reason Pig Latin makes data transformation easier than writing raw MapReduce code.

APig Latin requires writing Java code for every transformation, increasing control but complexity.

BPig Latin uses a high-level scripting language that abstracts complex MapReduce jobs into simple commands.

CPig Latin only works with small datasets, so transformations are faster.

DPig Latin replaces Hadoop with a different file system for data storage.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of a Pig Latin script for filtering data

What is the output of this Pig Latin script snippet?

data = LOAD 'input.txt' AS (name:chararray, age:int);
young = FILTER data BY age < 30;
DUMP young;

("Alice", 25)
("Bob", 28)

("Alice", 25)
("Bob", 28)
("Charlie", 35)

CSyntax error due to missing semicolon

DEmpty output because no data matches

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Result of a Pig GROUP and FOREACH operation

Given the data loaded as (category:chararray, value:int), what is the output of this Pig script?

grp = GROUP data BY category;
result = FOREACH grp GENERATE group, SUM(data.value);
DUMP result;

("fruit", 15)
("vegetable", 20)
("grain", 10)

BRuntime error due to missing alias

CEmpty output because data is not grouped

("fruit", 30)
("vegetable", 45)

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in this Pig Latin script

What error will this Pig Latin script produce?

data = LOAD 'input.txt' AS (name:chararray, age:int);
young = FILTER data BY age > 20;
DUMP young;

ASyntaxError due to missing semicolon after FILTER statement

BRuntime error because 'age' field is missing

CNo error, script runs successfully

DTypeError because 'age' is treated as string

Attempts:

2 left

🚀 Application

expert

3:00remaining

Choosing Pig for complex data transformation tasks

You have a large dataset with nested data and need to perform multiple joins, filters, and aggregations. Why is Pig a better choice than writing raw MapReduce jobs?

APig automatically optimizes hardware usage without any user input, unlike MapReduce.

BPig stores data in a proprietary format that speeds up processing compared to Hadoop's HDFS.

CPig provides a simpler scripting language that handles complex transformations with less code and easier debugging.

DPig requires no knowledge of data schemas, making it ideal for all data types.

Attempts:

2 left