beginner

What is a cross join in Apache Spark?

A cross join returns the Cartesian product of two DataFrames, pairing every row from the first DataFrame with every row from the second DataFrame.

Click to reveal answer

beginner

Why should you avoid cross joins on large datasets?

Because cross joins multiply the number of rows, they can create huge datasets that use a lot of memory and slow down processing.

Click to reveal answer

beginner

How can you perform a cross join in Apache Spark?

Use the `.crossJoin()` method between two DataFrames, for example: `df1.crossJoin(df2)`.

Click to reveal answer

intermediate

What is a safer alternative to cross joins when you want to combine data?

Use inner or outer joins with a join condition to combine related rows instead of all possible pairs.

Click to reveal answer

intermediate

What happens if you accidentally run a cross join on two large DataFrames?

It can cause your Spark job to run out of memory, crash, or take a very long time to finish.

Click to reveal answer

What does a cross join produce in Apache Spark?

ARows from the first DataFrame only

BOnly matching rows based on a key

CRows from the second DataFrame only

DAll combinations of rows from both DataFrames

Which method performs a cross join in Spark?

A.crossJoin()

B.join()

C.union()

D.select()

Why is it risky to use cross joins on big data?

AIt reduces data size

BIt only works on small datasets

CIt can cause memory and performance issues

DIt filters data incorrectly

What is a better option than cross join when combining related data?

AInner join with a condition

BCross join without condition

CUnion all

DSelect columns

If you want every row from DataFrame A to pair with every row from DataFrame B, which join do you use?

ALeft join

BCross join

CInner join

DRight join

Explain what a cross join does and why it can be problematic with large datasets.

Describe safer alternatives to cross joins when combining data in Apache Spark.