0
0
Hadoopdata~20 mins

Hive query optimization in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Hive Query Optimization Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of a Hive query with partition pruning
Consider a Hive table sales partitioned by year. What is the output count of this query?
SELECT COUNT(*) FROM sales WHERE year = 2023;
Hadoop
SELECT COUNT(*) FROM sales WHERE year = 2023;
AReturns count of all rows where year is 2023
BReturns count of all rows ignoring the year filter
CReturns count of all rows where year is 2022
DRaises a syntax error due to missing GROUP BY
Attempts:
2 left
💡 Hint
Partition pruning filters data before scanning, so only 2023 partitions are counted.
🧠 Conceptual
intermediate
2:00remaining
Effect of using OR vs UNION ALL in Hive queries
Which option best explains the performance difference between these two Hive queries?
Query 1: SELECT * FROM table WHERE col = 'A' OR col = 'B';
Query 2: SELECT * FROM table WHERE col = 'A' UNION ALL SELECT * FROM table WHERE col = 'B';
AQuery 1 is faster because OR uses index efficiently
BBoth queries have identical performance
CQuery 2 is faster because UNION ALL executes two simple filters separately
DQuery 1 causes a syntax error in Hive
Attempts:
2 left
💡 Hint
Splitting filters can help Hive optimize each scan separately.
🔧 Debug
advanced
2:00remaining
Identify the cause of slow Hive query with joins
This Hive query runs very slowly:
SELECT a.id, b.value FROM table_a a JOIN table_b b ON a.key = b.key WHERE a.date = '2023-01-01';

What is the most likely cause of the slow performance?
AMissing partition pruning on table_b causing full scan
BUsing JOIN instead of UNION ALL causes slowness
CThe WHERE clause filters table_b instead of table_a
DThe query syntax is invalid due to missing GROUP BY
Attempts:
2 left
💡 Hint
Check if both tables are partitioned and filters applied correctly.
data_output
advanced
2:00remaining
Result of using map-side join in Hive
Given table_small is small and table_large is very large, what is the output of this query?
SELECT /*+ MAPJOIN(table_small) */ l.id, s.value FROM table_large l JOIN table_small s ON l.key = s.key LIMIT 5;
AReturns 5 rows but performs a reduce-side join
BReturns 5 rows joining large and small tables using map-side join
CRaises an error because MAPJOIN hint is invalid
DReturns no rows because map-side join skips large table
Attempts:
2 left
💡 Hint
MAPJOIN loads the small table into memory to speed up join.
🚀 Application
expert
3:00remaining
Optimizing a Hive query with skewed data
You have a Hive table with skewed keys causing slow joins. Which option is the best approach to optimize the join performance?
AIncrease the number of reducers to a very high number
BRemove the skewed keys from the dataset before join
CUse UNION ALL instead of JOIN to avoid skew
DEnable <code>hive.optimize.skewjoin</code> to handle skewed keys automatically
Attempts:
2 left
💡 Hint
Hive has built-in features to handle skewed joins efficiently.