- pyspark.sql.functions.dayofmonth
- pyspark.sql.functions.dayofweek
- pyspark.sql.functions.dayofyear
One of the most common data manipulations in PySpark is working with date and time columns. PySpark provides several functions to extract day-related information from date and time columns, such as dayofmonth, dayofweek, and dayofyear. In this article, we will explore these functions in detail.
dayofmonth
dayofmonth: The dayofmonth function returns the day of the month from a date column. The function returns an integer between 1 and 31, representing the day of the month.
Syntax: df.select(dayofmonth(col("date_column_name"))).show()
from pyspark.sql.functions import dayofmonth
from pyspark.sql.functions import col
# Sample DataFrame
df = spark.createDataFrame([("2023-01-19",),("2023-02-11",),("2023-03-12",)], ["date"])
# Extract day of the month
df.select(dayofmonth(col("date"))).show()
Output
+----------------+
|dayofmonth(date)|
+----------------+
| 19|
| 11|
| 12|
+----------------+
dayofweek
dayofweek: The dayofweek function returns the day of the week from a date column. The function returns an integer between 1 (Sunday) and 7 (Saturday), representing the day of the week.
Syntax: df.select(dayofweek(col("date_column_name"))).show()
from pyspark.sql.functions import dayofweek
from pyspark.sql.functions import col
# Sample DataFrame
df = spark.createDataFrame([("2023-01-19",),("2023-02-11",),("2023-03-12",)], ["date"])
# Extract day of the week
df.select(dayofweek(col("date"))).show()
Output
+---------------+
|dayofweek(date)|
+---------------+
| 5|
| 7|
| 1|
+---------------+
dayofyear
dayofyear: The dayofyear function returns the day of the year from a date column. The function returns an integer between 1 and 366, representing the day of the year.
Syntax: df.select(dayofyear(col("date_column_name"))).show()
from pyspark.sql.functions import dayofyear
from pyspark.sql.functions import col
# Sample DataFrame
df = spark.createDataFrame([("2023-01-19",),("2023-02-11",),("2023-03-12",)], ["date"])
# Extract day of the year
df.select(dayofyear(col("date"))).show()
Output
+---------------+
|dayofyear(date)|
+---------------+
| 19|
| 42|
| 71|
+---------------+
The dayofmonth, dayofweek, and dayofyear functions in PySpark provide an easy way to extract day-related information .
Spark important urls to refer