PySpark : generate fixed frequency TimedeltaIndex

Spark_Pandas_Freshers_in

timedelta_range() stands out, enabling users to effortlessly generate fixed frequency TimedeltaIndex. Let’s explore its intricacies and applications through practical examples.

Understanding timedelta_range()

The timedelta_range() function in Pandas API on Spark is designed to create a fixed frequency TimedeltaIndex, allowing users to define precise time ranges. With customizable parameters, including start, end, periods, frequency, and more, this function offers flexibility in generating time-based indices.

Syntax

timedelta_range([start, end, periods, freq, ...])
  • start (optional): The start of the time range.
  • end (optional): The end of the time range.
  • periods (optional): The number of periods to generate.
  • freq (optional): The frequency of the TimedeltaIndex. Defaults to ‘day’.
  • Additional parameters for customizing the time range.

Practical Examples

Let’s delve into practical examples to grasp the functionality of timedelta_range() and its versatility.

Example 1: Basic Usage

from pyspark.sql import SparkSession
import pandas as pd

# Initialize Spark Session
spark = SparkSession.builder \
    .appName("timedelta_range_example @ Freshers.in") \
    .getOrCreate()

# Example 1: Basic Usage
td_index = pd.timedelta_range(start='1D', end='7D', freq='D')
df = spark.createDataFrame(td_index.to_series().reset_index())
df.show(20,False)
Output
+-----------------------------------+-----------------------------------+
|index                              |0                                  |
+-----------------------------------+-----------------------------------+
|INTERVAL '1 00:00:00' DAY TO SECOND|INTERVAL '1 00:00:00' DAY TO SECOND|
|INTERVAL '2 00:00:00' DAY TO SECOND|INTERVAL '2 00:00:00' DAY TO SECOND|
|INTERVAL '3 00:00:00' DAY TO SECOND|INTERVAL '3 00:00:00' DAY TO SECOND|
|INTERVAL '4 00:00:00' DAY TO SECOND|INTERVAL '4 00:00:00' DAY TO SECOND|
|INTERVAL '5 00:00:00' DAY TO SECOND|INTERVAL '5 00:00:00' DAY TO SECOND|
|INTERVAL '6 00:00:00' DAY TO SECOND|INTERVAL '6 00:00:00' DAY TO SECOND|
|INTERVAL '7 00:00:00' DAY TO SECOND|INTERVAL '7 00:00:00' DAY TO SECOND|
+-----------------------------------+-----------------------------------+

Example 2

# Example 2: Custom Frequency
td_index = pd.timedelta_range(start='1H', end='6H', freq='3H')
df = spark.createDataFrame(td_index.to_series().reset_index())
df.show(20,False)
Output
+-----------------------------------+-----------------------------------+
|index                              |0                                  |
+-----------------------------------+-----------------------------------+
|INTERVAL '0 01:00:00' DAY TO SECOND|INTERVAL '0 01:00:00' DAY TO SECOND|
|INTERVAL '0 04:00:00' DAY TO SECOND|INTERVAL '0 04:00:00' DAY TO SECOND|
+-----------------------------------+-----------------------------------+

Example 3

# Example 3: Specifying Number of Periods
td_index = pd.timedelta_range(start='1D', periods=5, freq='D')
df = spark.createDataFrame(td_index.to_series().reset_index())
df.show(20,False)
Output
+-----------------------------------+-----------------------------------+
|index                              |0                                  |
+-----------------------------------+-----------------------------------+
|INTERVAL '1 00:00:00' DAY TO SECOND|INTERVAL '1 00:00:00' DAY TO SECOND|
|INTERVAL '2 00:00:00' DAY TO SECOND|INTERVAL '2 00:00:00' DAY TO SECOND|
|INTERVAL '3 00:00:00' DAY TO SECOND|INTERVAL '3 00:00:00' DAY TO SECOND|
|INTERVAL '4 00:00:00' DAY TO SECOND|INTERVAL '4 00:00:00' DAY TO SECOND|
|INTERVAL '5 00:00:00' DAY TO SECOND|INTERVAL '5 00:00:00' DAY TO SECOND|
+-----------------------------------+-----------------------------------+
Author: user