0
0
Hadoopdata~5 mins

Hive query optimization in Hadoop - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the main goal of Hive query optimization?
The main goal is to make Hive queries run faster and use fewer resources by improving how data is processed and accessed.
Click to reveal answer
beginner
Explain what partitioning means in Hive.
Partitioning divides a large table into smaller parts based on a column, like splitting sales data by year. This helps Hive read only the needed parts, speeding up queries.
Click to reveal answer
intermediate
What is bucketing in Hive and how does it help optimization?
Bucketing splits data into a fixed number of files (buckets) based on a hash of a column. It helps by making joins and sampling faster because data is organized predictably.
Click to reveal answer
intermediate
How does predicate pushdown improve Hive query performance?
Predicate pushdown means filtering data early, close to where it is stored, so less data moves around. This reduces the amount of data Hive processes and speeds up queries.
Click to reveal answer
beginner
Why is using ORC or Parquet file formats recommended for Hive optimization?
ORC and Parquet store data in a columnar way, which means Hive reads only the columns needed. They also support compression and indexing, making queries faster and saving storage.
Click to reveal answer
What does partitioning in Hive help with?
AReading only relevant parts of data
BCompressing data files
CEncrypting data for security
DChanging data types automatically
Which file format is best for columnar storage in Hive?
ACSV
BJSON
CTXT
DORC
What is bucketing used for in Hive?
ASplitting data into fixed files for faster joins
BEncrypting data buckets
CBacking up data automatically
DChanging data schema
Predicate pushdown helps Hive by:
AChanging data format
BIncreasing data size
CFiltering data early to reduce processing
DSorting data alphabetically
Which of these is NOT a Hive optimization technique?
APartitioning
BAdding random delays
CBucketing
DUsing ORC format
Describe three ways Hive query optimization can improve query speed and resource use.
Think about how data is organized and filtered.
You got /4 concepts.
    Explain how using ORC or Parquet file formats helps Hive process data more efficiently.
    Focus on how data is stored and accessed.
    You got /4 concepts.