pyspark.sql.functions.date_add
The date_add function in PySpark is used to add a specified number of days to a date column. It’s part of the built-in Spark SQL functions and can be used in Spark DataFrames, Spark SQL and Spark Datasets.
The basic syntax of the function is as follows:
date_add(date, days)
where:
date is the date column that you want to add days to.
days is the number of days to add to the date column (can be a positive or negative number).
Here’s an example of how you can use the date_add function in PySpark:
from pyspark.sql import functions as F
df = spark.createDataFrame([("2023-01-01",),("2023-01-02",)], ["date"])
df = df.withColumn("new_date", F.date_add(df["date"], 1))
df.show()
Result
+----------+----------+
| date| new_date|
+----------+----------+
|2023-01-01|2022-01-02|
|2023-01-02|2022-01-03|
+----------+----------+
Note that the date_add function only adds days to a date, not to a timestamp. If you want to add a specified number of seconds, minutes, hours, etc. to a timestamp column, you can use the date_add function along with the expr function to write a custom expression.
Spark important urls to refer