PySpark with Pandas API : How to generates a fixed frequency DatetimeIndex : date_range()

In PySpark, the Pandas API offers powerful functionalities for working with time series data. One such function is date_range(), which generates a fixed frequency DatetimeIndex. This article provides an in-depth exploration of date_range(), covering its syntax, parameters, and practical applications through illustrative examples.

Understanding date_range()

The date_range() function in the Pandas API on Spark is used to generate a DatetimeIndex with a fixed frequency. It enables the creation of sequences of dates or timestamps, facilitating time series analysis, visualization, and manipulation tasks.

Syntax

The syntax for date_range() is as follows:

pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs)

Here, start, end, periods, freq, tz, normalize, name, and closed are the parameters that control the generation of the DatetimeIndex. Each parameter provides flexibility in defining the range, frequency, and timezone of the generated dates.

Examples

Let’s explore various scenarios to understand the functionality of date_range():

Example 1: Spark

from pyspark.sql import SparkSession
from pyspark.sql.functions import to_timestamp
import pandas as pd
# Create a SparkSession
spark = SparkSession.builder \
    .appName("date_range_example @ Learning @ Freshers.in ") \
    .getOrCreate()
# Generate a date range from January 1, 2022 to January 5, 2022
date_range = pd.date_range(start='2022-01-01', end='2022-01-05')
# Convert the pandas DateTimeIndex to a Spark DataFrame
df_pandas = pd.DataFrame(date_range, columns=['date'])
df_spark = spark.createDataFrame(df_pandas)
# Show the DataFrame
df_spark.show()

Output

+-------------------+
|               date|
+-------------------+
|2022-01-01 00:00:00|
|2022-01-02 00:00:00|
|2022-01-03 00:00:00|
|2022-01-04 00:00:00|
|2022-01-05 00:00:00|

Example 2: Basic Date Range Generation

import pandas as pd
# Generate a date range from January 1, 2022 to January 5, 2022
date_index = pd.date_range(start='2022-01-01', end='2022-01-05')
print(date_index)
# Output:

Output

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04',
               '2022-01-05'],
              dtype='datetime64[ns]', freq='D')

Example 3: Generating Timestamps with Specific Frequency

import pandas as pd
# Generate timestamps every hour for 3 days
date_index = pd.date_range(start='2022-01-01', periods=72, freq='H')
print(date_index)
# Output:

Output

DatetimeIndex(['2022-02-01 00:00:00-05:00', '2022-02-02 00:00:00-05:00',
               '2022-02-03 00:00:00-05:00', '2022-02-04 00:00:00-05:00',
               '2022-02-05 00:00:00-05:00', '2022-02-06 00:00:00-05:00',
               '2022-02-07 00:00:00-05:00', '2022-02-08 00:00:00-05:00',
               '2022-02-09 00:00:00-05:00', '2022-02-10 00:00:00-05:00',
               '2022-02-11 00:00:00-05:00', '2022-02-12 00:00:00-05:00',
               '2022-02-13 00:00:00-05:00', '2022-02-14 00:00:00-05:00',
               '2022-02-15 00:00:00-05:00', '2022-02-16 00:00:00-05:00',
               '2022-02-17 00:00:00-05:00', '2022-02-18 00:00:00-05:00',
               '2022-02-19 00:00:00-05:00', '2022-02-20 00:00:00-05:00',
               '2022-02-21 00:00:00-05:00', '2022-02-22 00:00:00-05:00',
               '2022-02-23 00:00:00-05:00', '2022-02-24 00:00:00-05:00',
               '2022-02-25 00:00:00-05:00', '2022-02-26 00:00:00-05:00',
               '2022-02-27 00:00:00-05:00', '2022-02-28 00:00:00-05:00'],
              dtype='datetime64[ns, America/New_York]', name='date', freq='D')

Spark important urls to refer

Post Views: 26

PySpark with Pandas API : How to generates a fixed frequency DatetimeIndex : date_range()

Understanding date_range()

Syntax

Examples

Example 1: Spark

Example 3: Generating Timestamps with Specific Frequency

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts

Understanding date_range()

Syntax

Examples

Example 1: Spark

Example 3: Generating Timestamps with Specific Frequency

Related Articles

Trending

Recent Posts

Featured Posts – Slider Widget