Challenge - 5 Problems

🎖️

Hive Query Optimization Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of a Hive query with partition pruning

Consider a Hive table sales partitioned by year. What is the output count of this query?

SELECT COUNT(*) FROM sales WHERE year = 2023;

Hadoop

SELECT COUNT(*) FROM sales WHERE year = 2023;

AReturns count of all rows where year is 2023

BReturns count of all rows ignoring the year filter

CReturns count of all rows where year is 2022

DRaises a syntax error due to missing GROUP BY

Attempts:

2 left

🧠 Conceptual

intermediate

2:00remaining

Effect of using OR vs UNION ALL in Hive queries

Which option best explains the performance difference between these two Hive queries?

Query 1: SELECT * FROM table WHERE col = 'A' OR col = 'B';
Query 2: SELECT * FROM table WHERE col = 'A' UNION ALL SELECT * FROM table WHERE col = 'B';

AQuery 1 is faster because OR uses index efficiently

BBoth queries have identical performance

CQuery 2 is faster because UNION ALL executes two simple filters separately

DQuery 1 causes a syntax error in Hive

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the cause of slow Hive query with joins

This Hive query runs very slowly:

SELECT a.id, b.value FROM table_a a JOIN table_b b ON a.key = b.key WHERE a.date = '2023-01-01';

What is the most likely cause of the slow performance?

AMissing partition pruning on table_b causing full scan

BUsing JOIN instead of UNION ALL causes slowness

CThe WHERE clause filters table_b instead of table_a

DThe query syntax is invalid due to missing GROUP BY

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Result of using map-side join in Hive

Given table_small is small and table_large is very large, what is the output of this query?

SELECT /*+ MAPJOIN(table_small) */ l.id, s.value FROM table_large l JOIN table_small s ON l.key = s.key LIMIT 5;

AReturns 5 rows but performs a reduce-side join

BReturns 5 rows joining large and small tables using map-side join

CRaises an error because MAPJOIN hint is invalid

DReturns no rows because map-side join skips large table

Attempts:

2 left

🚀 Application

expert

3:00remaining

Optimizing a Hive query with skewed data

You have a Hive table with skewed keys causing slow joins. Which option is the best approach to optimize the join performance?

AIncrease the number of reducers to a very high number

BRemove the skewed keys from the dataset before join

CUse UNION ALL instead of JOIN to avoid skew

DEnable <code>hive.optimize.skewjoin</code> to handle skewed keys automatically

Attempts:

2 left