0
0
Apache Sparkdata~10 mins

UDFs (User Defined Functions) in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to register a simple UDF that doubles a number.

Apache Spark
from pyspark.sql.functions import udf
from pyspark.sql.types import IntegerType

def double_num(x):
    return x * 2

double_udf = udf(double_num, [1])
Drag options to blanks, or click blank then click option'
AIntegerType()
BStringType()
CFloatType()
DBooleanType()
Attempts:
3 left
💡 Hint
Common Mistakes
Using StringType instead of IntegerType.
Not specifying the return type at all.
2fill in blank
medium

Complete the code to apply the UDF to a DataFrame column named 'value'.

Apache Spark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col

spark = SparkSession.builder.getOrCreate()
data = [(1,), (2,), (3,)]
df = spark.createDataFrame(data, ['value'])

result_df = df.withColumn('double_value', [1](col('value')))
Drag options to blanks, or click blank then click option'
Adouble_num
Bcol
Cudf
Ddouble_udf
Attempts:
3 left
💡 Hint
Common Mistakes
Using the original Python function instead of the UDF.
Using 'col' instead of the UDF.
3fill in blank
hard

Fix the error in the UDF registration by choosing the correct return type.

Apache Spark
from pyspark.sql.functions import udf
from pyspark.sql.types import [1]

def is_even(x):
    return x % 2 == 0

even_udf = udf(is_even, BooleanType())
Drag options to blanks, or click blank then click option'
AIntegerType
BBooleanType
CStringType
DFloatType
Attempts:
3 left
💡 Hint
Common Mistakes
Using IntegerType for a boolean result.
Using StringType which is incorrect here.
4fill in blank
hard

Fill both blanks to create a UDF that returns the length of a string and apply it to the 'name' column.

Apache Spark
from pyspark.sql.functions import col
from pyspark.sql.types import [1]

def str_length(s):
    return len(s)

length_udf = udf(str_length, [2])

result_df = df.withColumn('name_length', length_udf(col('name')))
Drag options to blanks, or click blank then click option'
AIntegerType
BStringType
CFloatType
DBooleanType
Attempts:
3 left
💡 Hint
Common Mistakes
Using StringType for the return type.
Using different types for the two blanks.
5fill in blank
hard

Fill all three blanks to create a UDF that checks if a number is positive, register it, and apply it to the 'score' column.

Apache Spark
from pyspark.sql.functions import col
from pyspark.sql.types import [1]

def is_positive(n):
    return n > 0

positive_udf = udf(is_positive, [2])

result_df = df.withColumn('is_positive', positive_udf([3]('score')))
Drag options to blanks, or click blank then click option'
AIntegerType
BBooleanType
Ccol
DStringType
Attempts:
3 left
💡 Hint
Common Mistakes
Using IntegerType instead of BooleanType.
Not using col() to refer to the column.