Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to perform a join between two DataFrames on the 'id' column.

Apache Spark

result = df1.join(df2, on=[1], how='inner')

Drag options to blanks, or click blank then click option'

A'name'

B'id'

C'age'

D'date'

Attempts:

3 left

2fill in blank

medium

Complete the code to add a salt column with random integers between 0 and 9 to the DataFrame.

Apache Spark

from pyspark.sql.functions import rand
salted_df = df.withColumn('salt', (rand() * 10).cast([1]))

Drag options to blanks, or click blank then click option'

A'integer'

B'string'

C'float'

D'boolean'

Attempts:

3 left

3fill in blank

hard

Fix the error in the code to perform a salted join by matching both 'id' and 'salt' columns.

Apache Spark

joined_df = df1.join(df2, on=['id', [1]], how='inner')

Drag options to blanks, or click blank then click option'

A'saltVal'

B'salted'

C'salt_column'

D'salt'

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a salted key by concatenating 'id' and 'salt' as strings.

Apache Spark

from pyspark.sql.functions import concat, col
salted_df = df.withColumn('salted_key', concat(col([1]), col([2])))

Drag options to blanks, or click blank then click option'

A'id'

B'salt'

C'key'

D'value'

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to filter the DataFrame for skewed keys where count is greater than 1000.

Apache Spark

skewed_keys = df.groupBy([1]).count().filter(col('count') [2] [3])

Drag options to blanks, or click blank then click option'

A'id'

C1000

D'salt'

Attempts:

3 left