Challenge - 5 Problems
Pig vs Hive Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate1:30remaining
Primary Language Used in Pig and Hive
Which language is primarily used to write scripts in Apache Pig and Apache Hive respectively?
Attempts:
2 left
💡 Hint
Think about the scripting languages designed specifically for each tool.
✗ Incorrect
Apache Pig uses Pig Latin, a procedural language, while Apache Hive uses HiveQL, a SQL-like declarative language.
🧠 Conceptual
intermediate1:30remaining
Data Processing Model Difference
What is the main difference in data processing models between Pig and Hive?
Attempts:
2 left
💡 Hint
Consider how users specify data transformations in each tool.
✗ Incorrect
Pig scripts describe step-by-step data transformations (procedural), while Hive queries specify what data to retrieve (declarative).
❓ Predict Output
advanced2:00remaining
Output of Pig Latin Script
What is the output of this Pig Latin script given the input data below?
Input data (file):
1,apple,10
2,banana,20
3,apple,15
Script:
fruit_data = LOAD 'input' USING PigStorage(',') AS (id:int, name:chararray, quantity:int);
apple_data = FILTER fruit_data BY name == 'apple';
total = FOREACH (GROUP apple_data ALL) GENERATE SUM(apple_data.quantity) as total_quantity;
DUMP total;
Input data (file):
1,apple,10
2,banana,20
3,apple,15
Script:
fruit_data = LOAD 'input' USING PigStorage(',') AS (id:int, name:chararray, quantity:int);
apple_data = FILTER fruit_data BY name == 'apple';
total = FOREACH (GROUP apple_data ALL) GENERATE SUM(apple_data.quantity) as total_quantity;
DUMP total;
Hadoop
fruit_data = LOAD 'input' USING PigStorage(',') AS (id:int, name:chararray, quantity:int); apple_data = FILTER fruit_data BY name == 'apple'; total = FOREACH (GROUP apple_data ALL) GENERATE SUM(apple_data.quantity) as total_quantity; DUMP total;
Attempts:
2 left
💡 Hint
Add the quantities for all rows where name is 'apple'.
✗ Incorrect
The script filters rows with name 'apple' (quantities 10 and 15) and sums them: 10 + 15 = 25. The output is ({25}).
❓ Predict Output
advanced2:00remaining
Hive Query Output for Grouping
Given a Hive table 'fruits' with columns (id INT, name STRING, quantity INT) and data:
1,apple,10
2,banana,20
3,apple,15
What is the output of this HiveQL query?
SELECT name, SUM(quantity) as total_quantity FROM fruits GROUP BY name ORDER BY total_quantity DESC;
1,apple,10
2,banana,20
3,apple,15
What is the output of this HiveQL query?
SELECT name, SUM(quantity) as total_quantity FROM fruits GROUP BY name ORDER BY total_quantity DESC;
Hadoop
SELECT name, SUM(quantity) as total_quantity FROM fruits GROUP BY name ORDER BY total_quantity DESC;Attempts:
2 left
💡 Hint
Sum quantities per fruit and order descending by total quantity.
✗ Incorrect
Apple total: 10 + 15 = 25, Banana total: 20.
Ordering by total_quantity DESC means apple first, then banana.
🚀 Application
expert2:30remaining
Choosing Between Pig and Hive for a Task
You have a large dataset with complex data transformations involving multiple steps and custom functions. You want to write scripts that allow step-by-step data manipulation and debugging. Which tool is more suitable?
Attempts:
2 left
💡 Hint
Think about which tool is designed for complex data flows and custom code.
✗ Incorrect
Pig is designed for procedural data flows and supports custom functions, making it better for complex transformations.
Hive is more declarative and suited for SQL-like queries.