0
0
Apache Sparkdata~20 mins

String functions in Spark in Apache Spark - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Spark String Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of substring extraction in Spark
What is the output of this Spark code snippet that extracts a substring from a column?
Apache Spark
from pyspark.sql import SparkSession
from pyspark.sql.functions import substring

spark = SparkSession.builder.getOrCreate()
data = [(1, "DataScience"), (2, "SparkFun"), (3, "Python")]
df = spark.createDataFrame(data, ["id", "word"])
df.select("id", substring("word", 2, 4).alias("sub_word")).orderBy("id").show()
A
+---+--------+
|id |sub_word|
+---+--------+
|1  |ataS    |
|2  |park    |
|3  |yth     |
+---+--------+
B
+---+--------+
|id |sub_word|
+---+--------+
|1  |ataS    |
|2  |par     |
|3  |ytho    |
+---+--------+
C
+---+--------+
|id |sub_word|
+---+--------+
|1  |ataS    |
|2  |parkF   |
|3  |ytho    |
+---+--------+
D
+---+--------+
|id |sub_word|
+---+--------+
|1  |ataS    |
|2  |park    |
|3  |ytho    |
+---+--------+
Attempts:
2 left
💡 Hint
Remember that substring in Spark starts at position 1 and length is the number of characters to extract.
data_output
intermediate
2:00remaining
Result of trimming spaces in Spark DataFrame
Given this Spark DataFrame, what is the output after applying the trim function to the 'text' column?
Apache Spark
from pyspark.sql import SparkSession
from pyspark.sql.functions import trim

spark = SparkSession.builder.getOrCreate()
data = [(1, "  hello  "), (2, " world"), (3, "spark  ")]
df = spark.createDataFrame(data, ["id", "text"])
df.select("id", trim("text").alias("trimmed_text")).orderBy("id").show()
A
+---+------------+
|id |trimmed_text|
+---+------------+
|1  |hello       |
|2  |world       |
|3  |spark       |
+---+------------+
B
+---+------------+
|id |trimmed_text|
+---+------------+
|1  |  hello     |
|2  | world      |
|3  |spark       |
+---+------------+
C
+---+------------+
|id |trimmed_text|
+---+------------+
|1  |hello       |
|2  |world       |
|3  |spark  	   |
+---+------------+
D
+---+------------+
|id |trimmed_text|
+---+------------+
|1  |hello       |
|2  |world       |
|3  | spark      |
+---+------------+
Attempts:
2 left
💡 Hint
The trim function removes spaces from both ends of the string.
🔧 Debug
advanced
2:00remaining
Identify the error in Spark string concatenation
What error will this Spark code raise when trying to concatenate two string columns?
Apache Spark
from pyspark.sql import SparkSession
from pyspark.sql.functions import concat

spark = SparkSession.builder.getOrCreate()
data = [(1, "Hello", "World"), (2, "Spark", "Fun")]
df = spark.createDataFrame(data, ["id", "col1", "col2"])
df.select("id", concat("col1", "col2").alias("greeting")).show()
ANo error, outputs concatenated strings without space
BAnalysisException: cannot resolve 'concat(col1, col2)' due to data type mismatch
CTypeError: concat() argument must be a Column or a list of Columns
DSyntaxError: invalid syntax in concat function call
Attempts:
2 left
💡 Hint
Check the argument types passed to concat function in Spark.
🚀 Application
advanced
2:00remaining
Using regexp_replace to clean data
You want to remove all digits from the 'info' column in a Spark DataFrame. Which code snippet correctly does this?
Adf.withColumn('clean_info', regexp_replace('info', '\\d', '')).show()
Bdf.withColumn('clean_info', regexp_replace('info', '[0-9]+', '')).show()
Cdf.withColumn('clean_info', regexp_replace('info', '[a-z]', '')).show()
Ddf.withColumn('clean_info', regexp_replace('info', '\\D', '')).show()
Attempts:
2 left
💡 Hint
Digits are represented by \d or [0-9] in regex, but consider the difference between \d and \D.
🧠 Conceptual
expert
2:00remaining
Understanding Spark string function behavior with nulls
What is the result of applying the Spark function upper() to a column containing null values?
AThe upper() function returns null for null input values.
BThe upper() function raises a NullPointerException when encountering null values.
CThe upper() function converts null values to empty strings before converting to uppercase.
DThe upper() function ignores null values and leaves them unchanged in the output.
Attempts:
2 left
💡 Hint
Think about how Spark functions handle null inputs generally.