PySpark : Extracting the Month from a Date in PySpark

PySpark @ Freshers.in

Working with dates

Working with dates and time is a common task in data analysis. Apache Spark provides a variety of functions to manipulate date and time data types, including a function to extract the month from a date. In this article, we will explore how to use the month() function in PySpark to extract the month of a given date as an integer.

The month() function extracts the month part from a given date and returns it as an integer. For example, if you have a date “2023-07-04”, applying the month() function to this date will return the integer value 7.

Firstly, let’s start by setting up a SparkSession, which is the entry point to any Spark functionality.

Sample code for Extracting the Month from a Date in PySpark

from pyspark.sql import SparkSession
# Initialize SparkSession
spark = SparkSession.builder.getOrCreate()

Create a DataFrame with a single column called date that contains some hard-coded date values.

data = [("2023-07-04",),
        ("2023-12-31",),
        ("2022-02-28",)]
df = spark.createDataFrame(data, ["date"])
df.show()

Output

+----------+
|      date|
+----------+
|2023-07-04|
|2023-12-31|
|2022-02-28|
+----------+

As our dates are in string format, we need to convert them into date type using the to_date function.

from pyspark.sql.functions import col, to_date
df = df.withColumn("date", to_date(col("date"), "yyyy-MM-dd"))
df.show()

Let’s use the month() function to extract the month from the date column.

from pyspark.sql.functions import month
df = df.withColumn("month", month("date"))
df.show()

Result

+----------+
|      date|
+----------+
|2023-07-04|
|2023-12-31|
|2022-02-28|
+----------+

As you can see, the month column contains the month part of the corresponding date in the date column. The month() function in PySpark provides a simple and effective way to retrieve the month part from a date, making it a valuable tool in a data scientist’s arsenal. This function, along with other date-time functions in PySpark, simplifies the process of handling date-time data.

Spark important urls to refer

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user

Leave a Reply