Complete the code to register a simple UDF that doubles a number.
from pyspark.sql.functions import udf from pyspark.sql.types import IntegerType def double_num(x): return x * 2 double_udf = udf(double_num, [1])
The UDF returns an integer, so we use IntegerType() to specify the return type.
Complete the code to apply the UDF to a DataFrame column named 'value'.
from pyspark.sql import SparkSession from pyspark.sql.functions import col spark = SparkSession.builder.getOrCreate() data = [(1,), (2,), (3,)] df = spark.createDataFrame(data, ['value']) result_df = df.withColumn('double_value', [1](col('value')))
We use the registered UDF double_udf to apply the function to the column.
Fix the error in the UDF registration by choosing the correct return type.
from pyspark.sql.functions import udf from pyspark.sql.types import [1] def is_even(x): return x % 2 == 0 even_udf = udf(is_even, BooleanType())
The function returns True or False, so the return type should be BooleanType.
Fill both blanks to create a UDF that returns the length of a string and apply it to the 'name' column.
from pyspark.sql.functions import col from pyspark.sql.types import [1] def str_length(s): return len(s) length_udf = udf(str_length, [2]) result_df = df.withColumn('name_length', length_udf(col('name')))
The function returns the length of a string, which is an integer, so IntegerType is used for both blanks.
Fill all three blanks to create a UDF that checks if a number is positive, register it, and apply it to the 'score' column.
from pyspark.sql.functions import col from pyspark.sql.types import [1] def is_positive(n): return n > 0 positive_udf = udf(is_positive, [2]) result_df = df.withColumn('is_positive', positive_udf([3]('score')))
The function returns True or False, so BooleanType is the return type. The UDF is applied to the column using col('score').