Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to broadcast the small DataFrame before joining.

Apache Spark

from pyspark.sql.functions import broadcast
result = large_df.join([1](small_df), 'id')

Drag options to blanks, or click blank then click option'

Abroadcast

Bcache

Cpersist

Dcollect

Attempts:

3 left

2fill in blank

medium

Complete the code to perform a broadcast join with a condition on 'id'.

Apache Spark

joined_df = large_df.join([1](small_df), large_df.id == small_df.id)

Drag options to blanks, or click blank then click option'

Abroadcast

Bcache

Cpersist

Dcollect

Attempts:

3 left

3fill in blank

hard

Fix the error in the code to correctly broadcast the small DataFrame before join.

Apache Spark

from pyspark.sql.functions import broadcast
joined = large_df.join([1](small_df), 'id')

Drag options to blanks, or click blank then click option'

Apersist

Bcache

Cbroadcast

Dcollect

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a broadcast join and select columns from both DataFrames.

Apache Spark

from pyspark.sql.functions import [1]
result = large_df.join([2](small_df), 'id').select('large_col', 'small_col')

Drag options to blanks, or click blank then click option'

Abroadcast

Bcache

Cpersist

Dcollect

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to broadcast the small DataFrame, join on 'id', and filter results.

Apache Spark

from pyspark.sql.functions import [1]
joined = large_df.join([2](small_df), 'id')
filtered = joined.filter(joined.[3] > 100)

Drag options to blanks, or click blank then click option'

Abroadcast

Bcache

Cvalue

Dpersist

Attempts:

3 left