The quarter function in PySpark is used to extract the quarter from a given date, aiding in the analysis and grouping of data by quarterly periods. It’s particularly valuable in financial analysis, trend analysis, and any scenario where data is evaluated on a quarterly basis. This article aims to elucidate the quarter function with a detailed example, tailored for both beginners and seasoned data professionals.
Syntax:
from pyspark.sql.functions import quarter
df.withColumn("quarter_column", quarter(df["date_column"]))
Example
Let’s consider a scenario where we have a dataset containing sales data, and we want to determine the quarter of each sale for seasonal trend analysis.
Sample data
Imagine we have the following data in a DataFrame named sales_df:
Date | Sales |
---|---|
2023-01-15 | 300 |
2023-04-10 | 450 |
2023-07-20 | 500 |
2023-10-05 | 550 |
2023-12-30 | 600 |
Code implementation
from pyspark.sql import SparkSession
from pyspark.sql.functions import quarter
from pyspark.sql.types import *
# Initialize Spark Session
spark = SparkSession.builder.appName("QuarterFunctionExample").getOrCreate()
# Sample data
data = [("2023-01-15", 300),
("2023-04-10", 450),
("2023-07-20", 500),
("2023-10-05", 550),
("2023-12-30", 600)]
# Define schema
schema = StructType([
StructField("Date", StringType(), True),
StructField("Sales", IntegerType(), True)
])
# Create DataFrame
sales_df = spark.createDataFrame(data, schema)
sales_df = sales_df.withColumn("Date", sales_df["Date"].cast(DateType()))
# Apply quarter function
sales_df_with_quarters = sales_df.withColumn("Quarter", quarter(sales_df["Date"]))
# Show results
sales_df_with_quarters.show()
+----------+-----+-------+
| Date|Sales|Quarter|
+----------+-----+-------+
|2023-01-15| 300| 1|
|2023-04-10| 450| 2|
|2023-07-20| 500| 3|
|2023-10-05| 550| 4|
|2023-12-30| 600| 4|
+----------+-----+-------+
Spark important urls to refer