0
0
Apache Sparkdata~10 mins

Date and timestamp functions in Apache Spark - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Date and timestamp functions
Start with DataFrame
Select date/timestamp column
Apply date/timestamp function
Get new column with result
Use result for analysis or display
End
We start with a DataFrame, pick a date or timestamp column, apply a function to transform or extract info, then use the result.
Execution Sample
Apache Spark
from pyspark.sql.functions import year, month, dayofmonth

# Extract year, month, day from date column
df.select(year('date').alias('year'), month('date').alias('month'), dayofmonth('date').alias('day')).show()
This code extracts year, month, and day parts from a date column in a Spark DataFrame.
Execution Table
StepInput Row (date)Function AppliedOutput Columns (year, month, day)
12023-06-15year('date')2023
12023-06-15month('date')6
12023-06-15dayofmonth('date')15
22022-12-01year('date')2022
22022-12-01month('date')12
22022-12-01dayofmonth('date')1
32024-01-31year('date')2024
32024-01-31month('date')1
32024-01-31dayofmonth('date')31
ExitNo more rowsN/AAll rows processed
💡 All rows processed, no more data to extract date parts.
Variable Tracker
VariableStartAfter Row 1After Row 2After Row 3Final
dateN/A2023-06-152022-12-012024-01-31Last row value
yearN/A202320222024Last extracted year
monthN/A6121Last extracted month
dayN/A15131Last extracted day
Key Moments - 3 Insights
Why do we call year('date') instead of just year?
year() is a function that needs a column input to know which date to extract the year from, as shown in execution_table rows 1-3.
What happens if the date column has null values?
The functions will return null for those rows, so no year, month, or day will be extracted, similar to how output columns depend on input values in execution_table.
Can we extract time parts like hour or minute similarly?
Yes, Spark has functions like hour() and minute() that work like year(), shown by analogy to the date part extraction in execution_table.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the month extracted from the date '2022-12-01'?
A1
B6
C12
D15
💡 Hint
Check row 5 in execution_table where input date is '2022-12-01' and function is month('date').
At which step does the day extracted equal 31?
AStep 1
BStep 3
CStep 2
DExit
💡 Hint
Look at execution_table rows 8-9 for dayofmonth('date') outputs.
If the date column had a null value at row 2, what would be the year output at that step?
Anull
B0
C2022
DError
💡 Hint
Refer to key_moments about null handling and execution_table logic for function outputs.
Concept Snapshot
Date and timestamp functions in Spark extract parts like year, month, day from date columns.
Use functions like year(col), month(col), dayofmonth(col).
They return integers or null if input is null.
Apply on DataFrame columns to create new columns.
Useful for time-based analysis and filtering.
Full Transcript
This visual execution trace shows how Spark date and timestamp functions work step-by-step. We start with a DataFrame containing a date column. For each row, we apply functions like year(), month(), and dayofmonth() to extract parts of the date. The execution table shows input dates and the output values for each function. Variables track how values change after processing each row. Key moments clarify why functions need column inputs, how nulls are handled, and that time parts can be extracted similarly. The quiz tests understanding of outputs at specific steps and null behavior. The snapshot summarizes the key usage points for these functions in Spark.