0
0
Hadoopdata~20 mins

Pig vs Hive comparison in Hadoop - Practice Questions

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Pig vs Hive Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Primary Language Used in Pig and Hive
Which language is primarily used to write scripts in Apache Pig and Apache Hive respectively?
APig Latin for Pig and HiveQL for Hive
BSQL for Pig and Java for Hive
CPython for Pig and Pig Latin for Hive
DHiveQL for Pig and Pig Latin for Hive
Attempts:
2 left
💡 Hint
Think about the scripting languages designed specifically for each tool.
🧠 Conceptual
intermediate
1:30remaining
Data Processing Model Difference
What is the main difference in data processing models between Pig and Hive?
ABoth use declarative models but Hive supports more data types
BPig uses a procedural data flow model; Hive uses a declarative SQL-like model
CBoth use procedural models but with different syntax
DPig uses a declarative model; Hive uses a procedural model
Attempts:
2 left
💡 Hint
Consider how users specify data transformations in each tool.
Predict Output
advanced
2:00remaining
Output of Pig Latin Script
What is the output of this Pig Latin script given the input data below?

Input data (file):
1,apple,10
2,banana,20
3,apple,15

Script:
fruit_data = LOAD 'input' USING PigStorage(',') AS (id:int, name:chararray, quantity:int);
apple_data = FILTER fruit_data BY name == 'apple';
total = FOREACH (GROUP apple_data ALL) GENERATE SUM(apple_data.quantity) as total_quantity;
DUMP total;
Hadoop
fruit_data = LOAD 'input' USING PigStorage(',') AS (id:int, name:chararray, quantity:int);
apple_data = FILTER fruit_data BY name == 'apple';
total = FOREACH (GROUP apple_data ALL) GENERATE SUM(apple_data.quantity) as total_quantity;
DUMP total;
A({45})
B({30})
C({15})
D({25})
Attempts:
2 left
💡 Hint
Add the quantities for all rows where name is 'apple'.
Predict Output
advanced
2:00remaining
Hive Query Output for Grouping
Given a Hive table 'fruits' with columns (id INT, name STRING, quantity INT) and data:
1,apple,10
2,banana,20
3,apple,15

What is the output of this HiveQL query?
SELECT name, SUM(quantity) as total_quantity FROM fruits GROUP BY name ORDER BY total_quantity DESC;
Hadoop
SELECT name, SUM(quantity) as total_quantity FROM fruits GROUP BY name ORDER BY total_quantity DESC;
A[('banana', 30), ('apple', 25)]
B[('banana', 20), ('apple', 25)]
C[('apple', 25), ('banana', 20)]
D[('apple', 15), ('banana', 20)]
Attempts:
2 left
💡 Hint
Sum quantities per fruit and order descending by total quantity.
🚀 Application
expert
2:30remaining
Choosing Between Pig and Hive for a Task
You have a large dataset with complex data transformations involving multiple steps and custom functions. You want to write scripts that allow step-by-step data manipulation and debugging. Which tool is more suitable?
APig, because it supports procedural scripts with stepwise transformations and custom functions
BHive, because it supports SQL-like queries and is easier for analysts
CHive, because it is faster for all types of data processing
DPig, because it only supports simple queries and no custom functions
Attempts:
2 left
💡 Hint
Think about which tool is designed for complex data flows and custom code.