Challenge - 5 Problems
Hive GROUP and JOIN Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of a GROUP BY operation in Hive
Given the following Hive query on a sales table, what is the output?
Hadoop
SELECT product_id, COUNT(*) AS total_sales FROM sales GROUP BY product_id;
Attempts:
2 left
💡 Hint
Count how many times each product_id appears in the sales table.
✗ Incorrect
The GROUP BY groups rows by product_id, and COUNT(*) counts rows per group. The correct counts match option C.
❓ data_output
intermediate2:00remaining
Result of INNER JOIN between two Hive tables
What is the result of this Hive query joining customers and orders on customer_id?
Hadoop
SELECT c.customer_id, c.name, o.order_id FROM customers c INNER JOIN orders o ON c.customer_id = o.customer_id;
Attempts:
2 left
💡 Hint
INNER JOIN returns rows with matching customer_id in both tables.
✗ Incorrect
Only customers with matching orders appear. Option A correctly shows customers 1, 2, and 3 with their orders.
🔧 Debug
advanced2:00remaining
Identify the error in this Hive JOIN query
What error does this Hive query produce?
Hadoop
SELECT c.customer_id, o.order_id FROM customers c JOIN orders o WHERE c.customer_id = o.customer_id;
Attempts:
2 left
💡 Hint
JOIN requires ON clause to specify join condition.
✗ Incorrect
Hive JOIN syntax requires ON clause. Using WHERE instead causes syntax error.
🚀 Application
advanced2:00remaining
Choosing the right JOIN type for data analysis
You want to find all customers and their orders, including customers with no orders. Which JOIN type should you use in Hive?
Attempts:
2 left
💡 Hint
Think about including all customers even if they have no matching orders.
✗ Incorrect
LEFT OUTER JOIN returns all rows from left table (customers) and matching rows from right (orders).
🧠 Conceptual
expert2:00remaining
Effect of GROUP BY with multiple columns in Hive
What is the effect of this Hive query grouping by two columns?
Hadoop
SELECT product_id, store_id, SUM(quantity) FROM sales GROUP BY product_id, store_id;
Attempts:
2 left
💡 Hint
GROUP BY multiple columns groups by unique pairs of those columns.
✗ Incorrect
Grouping by product_id and store_id creates groups for each unique pair, summing quantity per pair.