Date and timestamp functions help us work with dates and times easily. They let us find differences, add time, or format dates in data.
Date and timestamp functions in Apache Spark
from pyspark.sql.functions import current_date, current_timestamp, datediff, date_add, date_sub, year, month, dayofmonth, to_date, unix_timestamp # Example usage: df.select(current_date(), current_timestamp()) df.select(datediff(df.end_date, df.start_date)) df.select(date_add(df.date, 5)) df.select(year(df.date))
Use from pyspark.sql.functions import <function_name> to access date functions.
Functions work on columns in Spark DataFrames, not on plain Python dates.
from pyspark.sql.functions import current_date df.select(current_date())
from pyspark.sql.functions import datediff df.select(datediff(df.end_date, df.start_date))
from pyspark.sql.functions import date_add df.select(date_add(df.date, 10))
from pyspark.sql.functions import year df.select(year(df.date))
This program creates a Spark DataFrame with start and end dates. It converts strings to dates, then calculates the difference in days, adds 5 days to the start date, extracts the year, and shows the current date and timestamp.
from pyspark.sql import SparkSession from pyspark.sql.functions import current_date, current_timestamp, datediff, date_add, year spark = SparkSession.builder.appName('DateExample').getOrCreate() # Create sample data data = [ ('2024-01-01', '2024-01-10'), ('2024-02-15', '2024-02-20') ] columns = ['start_date', 'end_date'] df = spark.createDataFrame(data, columns) # Convert string to date type from pyspark.sql.functions import to_date df = df.withColumn('start_date', to_date('start_date')) .withColumn('end_date', to_date('end_date')) # Calculate days difference result = df.select( 'start_date', 'end_date', datediff('end_date', 'start_date').alias('days_diff'), date_add('start_date', 5).alias('start_plus_5_days'), year('start_date').alias('start_year'), current_date().alias('today'), current_timestamp().alias('now') ) result.show(truncate=False) spark.stop()
Make sure date columns are in date format, not string, before using date functions.
Current date and timestamp depend on the system running Spark.
Use to_date() to convert strings to dates.
Date and timestamp functions help analyze and manipulate time data easily.
Common tasks include finding differences, adding days, and extracting parts like year or month.
Always convert strings to date type before using these functions in Spark.