beginner

What is the main goal of Hive query optimization?

The main goal is to make Hive queries run faster and use fewer resources by improving how data is processed and accessed.

Click to reveal answer

beginner

Explain what partitioning means in Hive.

Partitioning divides a large table into smaller parts based on a column, like splitting sales data by year. This helps Hive read only the needed parts, speeding up queries.

Click to reveal answer

intermediate

What is bucketing in Hive and how does it help optimization?

Bucketing splits data into a fixed number of files (buckets) based on a hash of a column. It helps by making joins and sampling faster because data is organized predictably.

Click to reveal answer

intermediate

How does predicate pushdown improve Hive query performance?

Predicate pushdown means filtering data early, close to where it is stored, so less data moves around. This reduces the amount of data Hive processes and speeds up queries.

Click to reveal answer

beginner

Why is using ORC or Parquet file formats recommended for Hive optimization?

ORC and Parquet store data in a columnar way, which means Hive reads only the columns needed. They also support compression and indexing, making queries faster and saving storage.

Click to reveal answer

What does partitioning in Hive help with?

AReading only relevant parts of data

BCompressing data files

CEncrypting data for security

DChanging data types automatically

Which file format is best for columnar storage in Hive?

ACSV

BJSON

CTXT

DORC

What is bucketing used for in Hive?

ASplitting data into fixed files for faster joins

BEncrypting data buckets

CBacking up data automatically

DChanging data schema

Predicate pushdown helps Hive by:

AChanging data format

BIncreasing data size

CFiltering data early to reduce processing

DSorting data alphabetically

Which of these is NOT a Hive optimization technique?

APartitioning

BAdding random delays

CBucketing

DUsing ORC format

Describe three ways Hive query optimization can improve query speed and resource use.

Explain how using ORC or Parquet file formats helps Hive process data more efficiently.