Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to register a simple UDF that doubles a number.

Apache Spark

from pyspark.sql.functions import udf
from pyspark.sql.types import IntegerType

def double_num(x):
    return x * 2

double_udf = udf(double_num, [1])

Drag options to blanks, or click blank then click option'

AIntegerType()

BStringType()

CFloatType()

DBooleanType()

Attempts:

3 left

2fill in blank

medium

Complete the code to apply the UDF to a DataFrame column named 'value'.

Apache Spark

from pyspark.sql import SparkSession
from pyspark.sql.functions import col

spark = SparkSession.builder.getOrCreate()
data = [(1,), (2,), (3,)]
df = spark.createDataFrame(data, ['value'])

result_df = df.withColumn('double_value', [1](col('value')))

Drag options to blanks, or click blank then click option'

Adouble_num

Bcol

Cudf

Ddouble_udf

Attempts:

3 left

3fill in blank

hard

Fix the error in the UDF registration by choosing the correct return type.

Apache Spark

from pyspark.sql.functions import udf
from pyspark.sql.types import [1]

def is_even(x):
    return x % 2 == 0

even_udf = udf(is_even, BooleanType())

Drag options to blanks, or click blank then click option'

AIntegerType

BBooleanType

CStringType

DFloatType

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a UDF that returns the length of a string and apply it to the 'name' column.

Apache Spark

from pyspark.sql.functions import col
from pyspark.sql.types import [1]

def str_length(s):
    return len(s)

length_udf = udf(str_length, [2])

result_df = df.withColumn('name_length', length_udf(col('name')))

Drag options to blanks, or click blank then click option'

AIntegerType

BStringType

CFloatType

DBooleanType

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to create a UDF that checks if a number is positive, register it, and apply it to the 'score' column.

Apache Spark

from pyspark.sql.functions import col
from pyspark.sql.types import [1]

def is_positive(n):
    return n > 0

positive_udf = udf(is_positive, [2])

result_df = df.withColumn('is_positive', positive_udf([3]('score')))

Drag options to blanks, or click blank then click option'

AIntegerType

BBooleanType

Ccol

DStringType

Attempts:

3 left