0
0
Hadoopdata~20 mins

GROUP and JOIN operations in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Hive GROUP and JOIN Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of a GROUP BY operation in Hive
Given the following Hive query on a sales table, what is the output?
Hadoop
SELECT product_id, COUNT(*) AS total_sales FROM sales GROUP BY product_id;
A[{'product_id': 'P1', 'total_sales': 3}, {'product_id': 'P2', 'total_sales': 4}]
B[{'product_id': 'P1', 'total_sales': 5}, {'product_id': 'P2', 'total_sales': 2}]
C[{'product_id': 'P1', 'total_sales': 3}, {'product_id': 'P2', 'total_sales': 2}]
D[{'product_id': 'P1', 'total_sales': 2}, {'product_id': 'P2', 'total_sales': 2}]
Attempts:
2 left
💡 Hint
Count how many times each product_id appears in the sales table.
data_output
intermediate
2:00remaining
Result of INNER JOIN between two Hive tables
What is the result of this Hive query joining customers and orders on customer_id?
Hadoop
SELECT c.customer_id, c.name, o.order_id FROM customers c INNER JOIN orders o ON c.customer_id = o.customer_id;
A[{'customer_id': 1, 'name': 'Alice', 'order_id': 101}, {'customer_id': 2, 'name': 'Bob', 'order_id': 102}, {'customer_id': 3, 'name': 'Charlie', 'order_id': 103}]
B[{'customer_id': 1, 'name': 'Alice', 'order_id': 101}, {'customer_id': 3, 'name': 'Charlie', 'order_id': 103}]
C[{'customer_id': 2, 'name': 'Bob', 'order_id': 102}, {'customer_id': 3, 'name': 'Charlie', 'order_id': 103}]
D[{'customer_id': 1, 'name': 'Alice', 'order_id': 101}, {'customer_id': 2, 'name': 'Bob', 'order_id': 102}]
Attempts:
2 left
💡 Hint
INNER JOIN returns rows with matching customer_id in both tables.
🔧 Debug
advanced
2:00remaining
Identify the error in this Hive JOIN query
What error does this Hive query produce?
Hadoop
SELECT c.customer_id, o.order_id FROM customers c JOIN orders o WHERE c.customer_id = o.customer_id;
ANo error, query runs successfully
BSyntaxError: Missing ON clause in JOIN
CRuntimeError: Table not found
DSemanticError: Ambiguous column reference
Attempts:
2 left
💡 Hint
JOIN requires ON clause to specify join condition.
🚀 Application
advanced
2:00remaining
Choosing the right JOIN type for data analysis
You want to find all customers and their orders, including customers with no orders. Which JOIN type should you use in Hive?
AFULL OUTER JOIN
BINNER JOIN
CRIGHT OUTER JOIN
DLEFT OUTER JOIN
Attempts:
2 left
💡 Hint
Think about including all customers even if they have no matching orders.
🧠 Conceptual
expert
2:00remaining
Effect of GROUP BY with multiple columns in Hive
What is the effect of this Hive query grouping by two columns?
Hadoop
SELECT product_id, store_id, SUM(quantity) FROM sales GROUP BY product_id, store_id;
AAggregates quantity for each unique product and store combination
BAggregates quantity for each store across all products
CAggregates quantity for each product across all stores
DAggregates quantity for all products and stores combined
Attempts:
2 left
💡 Hint
GROUP BY multiple columns groups by unique pairs of those columns.