Complete the code to extract the year from the 'date' column.
from pyspark.sql.functions import [1] df.select([1]("date")).show()
The year function extracts the year part from a date or timestamp column.
Complete the code to add 5 days to the 'date' column.
from pyspark.sql.functions import date_add df.select(date_add("date", [1])).show()
The date_add function adds the specified number of days to a date column. Here, 5 days are added.
Fix the error in the code to calculate the difference in days between 'date1' and 'date2'.
from pyspark.sql.functions import datediff df.select(datediff("date1", [1])).show()
The datediff function needs two date columns. The second argument should be the string name of the second date column, here "date2".
Fill both blanks to create a new column 'hour' extracting the hour from 'timestamp' and filter rows where hour is greater than 12.
from pyspark.sql.functions import [1] df.select([1]("timestamp").alias("hour")).filter("hour [2] 12").show()
The hour function extracts the hour from a timestamp. The filter keeps rows where hour is greater than 12.
Fill all three blanks to format the 'date' column as 'yyyy-MM' into a new column 'month_year' and filter rows where month_year > '2023-06'.
from pyspark.sql.functions import [1] df.withColumn("month_year", [1]([2], [3])).filter("month_year > '2023-06'").show()
The date_format function formats a date or timestamp column according to the specified format string, here extracting year and month as 'yyyy-MM'.