Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to join two DataFrames on multiple columns.

Apache Spark

joined_df = df1.join(df2, on=[1])

Drag options to blanks, or click blank then click option'

A'date'

B['id', 'date']

C'id'

Ddf2.id == df1.id

Attempts:

3 left

2fill in blank

medium

Complete the code to perform an inner join on columns 'user_id' and 'order_id'.

Apache Spark

result = df_orders.join(df_users, on=[1], how='inner')

Drag options to blanks, or click blank then click option'

A'order_id'

Bdf_orders.user_id == df_users.user_id

C['user_id', 'order_id']

D'user_id'

Attempts:

3 left

3fill in blank

hard

Fix the error in the join condition to correctly join on 'city' and 'state'.

Apache Spark

joined = df_a.join(df_b, (df_a.city == df_b.city) & (df_a.[1] == df_b.state))

Drag options to blanks, or click blank then click option'

Astate

Bcity

Czip

Dcountry

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a dictionary for joining on 'country' and 'city' with different column names.

Apache Spark

join_cols = { [1]: [2] }

Drag options to blanks, or click blank then click option'

A'country'

B'city'

C'country_name'

D'city_name'

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to create a join condition using multiple columns with different names.

Apache Spark

join_condition = (df1.[1] == df2.[2]) & (df1.[3] == df2.city)

Drag options to blanks, or click blank then click option'

Acountry

Bcountry_name

Cstate

Dstate_name

Attempts:

3 left