Recall & Review
beginner
What is the main goal of Hive query optimization?
The main goal is to make Hive queries run faster and use fewer resources by improving how data is processed and accessed.
Click to reveal answer
beginner
Explain what partitioning means in Hive.
Partitioning divides a large table into smaller parts based on a column, like splitting sales data by year. This helps Hive read only the needed parts, speeding up queries.
Click to reveal answer
intermediate
What is bucketing in Hive and how does it help optimization?
Bucketing splits data into a fixed number of files (buckets) based on a hash of a column. It helps by making joins and sampling faster because data is organized predictably.
Click to reveal answer
intermediate
How does predicate pushdown improve Hive query performance?
Predicate pushdown means filtering data early, close to where it is stored, so less data moves around. This reduces the amount of data Hive processes and speeds up queries.
Click to reveal answer
beginner
Why is using ORC or Parquet file formats recommended for Hive optimization?
ORC and Parquet store data in a columnar way, which means Hive reads only the columns needed. They also support compression and indexing, making queries faster and saving storage.
Click to reveal answer
What does partitioning in Hive help with?
✗ Incorrect
Partitioning splits data so Hive reads only the needed parts, improving query speed.
Which file format is best for columnar storage in Hive?
✗ Incorrect
ORC is a columnar format that helps Hive read only needed columns, speeding up queries.
What is bucketing used for in Hive?
✗ Incorrect
Bucketing organizes data into fixed files to optimize joins and sampling.
Predicate pushdown helps Hive by:
✗ Incorrect
Filtering data close to storage reduces the amount Hive processes, speeding queries.
Which of these is NOT a Hive optimization technique?
✗ Incorrect
Adding random delays does not optimize Hive queries.
Describe three ways Hive query optimization can improve query speed and resource use.
Think about how data is organized and filtered.
You got /4 concepts.
Explain how using ORC or Parquet file formats helps Hive process data more efficiently.
Focus on how data is stored and accessed.
You got /4 concepts.