0
0
Apache Sparkdata~20 mins

Date and timestamp functions in Apache Spark - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Date and Timestamp Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of date_add function in Spark
What is the output of the following Spark code snippet?
Apache Spark
from pyspark.sql import SparkSession
from pyspark.sql.functions import date_add, to_date

spark = SparkSession.builder.getOrCreate()
data = [("2024-06-15",)]
df = spark.createDataFrame(data, ["date_str"])
df = df.withColumn("date", to_date("date_str"))
df = df.withColumn("new_date", date_add("date", 10))
df.select("new_date").show()
A
+----------+
|  new_date|
+----------+
|2024-06-15|
+----------+
B
+----------+
|  new_date|
+----------+
|2024-06-25|
+----------+
C
+----------+
|  new_date|
+----------+
|2024-06-05|
+----------+
D
+----------+
|  new_date|
+----------+
|2024-07-15|
+----------+
Attempts:
2 left
💡 Hint
date_add adds days to a date column.
data_output
intermediate
2:00remaining
Result of truncating timestamp to month
Given the following Spark DataFrame, what is the output after truncating the timestamp to the month?
Apache Spark
from pyspark.sql import SparkSession
from pyspark.sql.functions import trunc, to_timestamp

spark = SparkSession.builder.getOrCreate()
data = [("2024-06-15 13:45:30",)]
df = spark.createDataFrame(data, ["ts_str"])
df = df.withColumn("ts", to_timestamp("ts_str"))
df = df.withColumn("month_start", trunc("ts", "MM"))
df.select("month_start").show()
A
+-----------+
|month_start|
+-----------+
| 2024-06-01|
+-----------+
B
+-----------+
|month_start|
+-----------+
| 2024-06-15|
+-----------+
C
+-----------+
|month_start|
+-----------+
| 2024-01-01|
+-----------+
D
+-----------+
|month_start|
+-----------+
| 2024-07-01|
+-----------+
Attempts:
2 left
💡 Hint
trunc with 'MM' returns the first day of the month.
🔧 Debug
advanced
2:00remaining
Identify the error in timestamp conversion
What error will the following Spark code raise when trying to convert a string to timestamp?
Apache Spark
from pyspark.sql import SparkSession
from pyspark.sql.functions import to_timestamp

spark = SparkSession.builder.getOrCreate()
data = [("2024-13-01 10:00:00",)]
df = spark.createDataFrame(data, ["ts_str"])
df = df.withColumn("ts", to_timestamp("ts_str"))
df.show()
ASyntaxError: invalid syntax in to_timestamp function call.
BValueError: month must be in 1..12.
CRuntimeError: Timestamp format not supported.
DThe ts column will contain null values due to invalid month.
Attempts:
2 left
💡 Hint
Invalid dates in Spark timestamp conversion result in nulls, not exceptions.
🧠 Conceptual
advanced
2:00remaining
Understanding unix_timestamp function output
What does the unix_timestamp function return when applied to a timestamp column in Spark?
AThe number of seconds since January 1, 1970 UTC as a long integer.
BThe timestamp formatted as a string in 'yyyy-MM-dd HH:mm:ss'.
CThe timestamp converted to the local timezone string.
DThe number of milliseconds since January 1, 1970 UTC as a long integer.
Attempts:
2 left
💡 Hint
unix_timestamp returns seconds, not milliseconds.
🚀 Application
expert
3:00remaining
Calculate age in years from birthdate column
Given a Spark DataFrame with a birthdate column of type date, which code snippet correctly calculates the age in years as an integer?
A
from pyspark.sql.functions import col, year, current_date

df = df.withColumn('age', year(current_date()) - year(col('birthdate')))
B
from pyspark.sql.functions import col, datediff

df = df.withColumn('age', datediff(current_date(), col('birthdate')) // 365)
C
from pyspark.sql.functions import col, floor, months_between

df = df.withColumn('age', floor(months_between(current_date(), col('birthdate')) / 12))
D
from pyspark.sql.functions import col, to_date

df = df.withColumn('age', to_date(current_date()) - to_date(col('birthdate')))
Attempts:
2 left
💡 Hint
Use months_between and floor to get accurate age in years.