beginner

What is a skewed join in Apache Spark?

A skewed join happens when one or more keys in the join have a very large number of records, causing some tasks to take much longer and slow down the whole job.

Click to reveal answer

beginner

Why do skewed joins cause performance problems?

Because the data for some keys is much larger, the tasks handling those keys take longer, causing uneven workload and delays in the join operation.

Click to reveal answer

intermediate

Name one common technique to handle skewed joins in Spark.

One common technique is to use a 'salting' method, which adds a random number to the join key to spread out the large key's data across multiple tasks.

Click to reveal answer

intermediate

What is the 'salting' technique in handling skewed joins?

Salting adds a random number to the join key on both sides of the join, splitting the large key's data into smaller parts to balance the workload.

Click to reveal answer

advanced

How does Spark's built-in skew join optimization work?

Spark detects skewed keys automatically and splits the join into two parts: one for normal keys and one for skewed keys, processing skewed keys separately to improve performance.

Click to reveal answer

What causes a skewed join in Spark?

AOne or more keys have many more records than others

BAll keys have equal number of records

CData is sorted before join

DJoin keys are missing

Which technique helps to balance data in skewed joins by modifying join keys?

ABroadcasting

BCaching

CSalting

DFiltering

What does Spark do when using built-in skew join optimization?

AIgnores skewed keys

BProcesses skewed keys separately

CDrops skewed keys

DSorts all data

Which of these is NOT a way to handle skewed joins?

AFiltering skewed keys

BBroadcast join for small table

CSalting keys

DIgnoring skew

Why is salting done on both sides of the join?

ATo keep keys matching after modification

BTo increase data size

CTo remove duplicates

DTo sort data

Explain what a skewed join is and why it causes problems in Spark.

Describe the salting technique and how it helps fix skewed joins.